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Preface 


TCC 2012, the 9th Theory of Cryptography Conference, was held in Taormina 
(Sicily), Italy, during March 19-21, 2012. It was sponsored by the International 
Association for Cryptologic Research (IACR). The General Chairs were Nelly 
Fazio and Rosario Gennaro. The Local Arrangements Chair was Dario Catalano. 

By the deadline of September 15, 2011, the Program Committee (PC) had re- 
ceived 131 electronic submissions. As usual, the selection process was carried out 
using a Web-based interface and consisted of three phases. In the review phase 
each submission was assigned to at least three PC members for independent 
review (six in the case of a PC member submission). In the discussion phase, 
reviews pertaining to the same submission were compared and agreement on a 
common view was reached where possible. Additional reviews were solicited as 
needed. The perceived relative merits of all submissions were taken into con- 
sideration as a basis for the selection phase. By the December 1 deadline for 
notification of decisions, the PC had selected 36 submissions for (20-min.) pre- 
sentation at the conference. Reviewer comments for all submissions were sent 
out to their respective authors soon after. These proceedings contain the revised 
versions of the 36 selected submissions, as received by January 3, 2012. These 
revised versions were not subjected to further review by the PC and authors 
bear full responsibility for contents. 

The program also featured two invited (60-min.) talks. Jens Groth and Sergey 
Yekhanin treated us to excellent surveys of Non-Interactive Zero-Knowledge and 
Locally Decodable Codes, respectively. A Best Student Paper Award was shared 
by Nir Bitansky and Omer Paneth for their paper “Point Obfuscation and 3- 
round Zero-Knowledge” and by Anindya De for his paper “Lower Bounds in 
Differential Privacy”. In addition, a traditional Rump Session was held, consist- 
ing of (5-min.) research announcements. It was organized and chaired by Tal 
Malkin. The organizers had this evening session well catered for with nice drinks 
and snacks. 

I thank the PC members for their hard work, as well as the external reviewers. 
Oded Goldreich (TCC Steering Committee Chair) and Yuval Ishai (TCC 2011 
Program Chair) provided quick and helpful advice upon my request as well as 
answers to my questions, for which I am grateful. The PC used Shai Halevi’s ex- 
cellent Web Submission and Review software to handle the submissions. Thanks 
to Maarten Dijkema of CWI’s IT Support for running this software at our sys- 
tem and for his unwavering assistance and thanks to Shai for rendering efficient 
“customer service” to us. Also, thanks to Tal for running the Rump Session. The 
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organizers wish to express their gratitude to the TCC 2012 sponsors Alcatel- 
Lucent Bell Labs, IBM Research and Microsoft Research for generous donations 
that supported local organization in several ways, including student stipends. In 
turn, I thank Nelly, Rosario and Dario for our pleasant collaboration. Finally, 
thanks to all authors of submissions to TCC 2012. 


January 2012 Ronald Cramer 
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Abstract. In tandem with recent progress on computing on encrypted 
data via fully homomorphic encryption, we present a framework for com- 
puting on authenticated data via the notion of slightly homomorphic 
signatures, or P-homomorphic signatures. With such signatures, it is 
possible for a third party to derive a signature on the object m’ from a 
signature of m as long as P(m, m’) = 1 for some predicate P which cap- 
tures the “authenticatable relationship” between m’ and m. Moreover, a 
derived signature on m’ reveals no extra information about the parent m. 

Our definition is carefully formulated to provide one unified frame- 
work for a variety of distinct concepts in this area, including arithmetic, 
homomorphic, quotable, redactable, transitive signatures and more. It 
includes being unable to distinguish a derived signature from a fresh 
one even when given the original signature. The inability to link derived 
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signatures to their original sources prevents some practical privacy and 
linking attacks, which is a challenge not satisfied by most prior works. 

Under this strong definition, we then provide generic constructions for 
all univariate and closed predicates, and specific efficient constructions 
for a broad class of natural predicates such as quoting, subsets, weighted 
sums, averages, and Fourier transforms. To our knowledge, these are the 
first efficient constructions for these predicates (excluding subsets) that 
provably satisfy this strong security notion. 


1 Introduction 


In tandem with recent progress on computing any function on encrypted data, 
e.g., , this work explores computing on unencrypted signed data. In the 
past few years, several independent lines of research touched on this area: 


— Quoting/redacting: Given Alice’s signature on some 
message m anyone should be able to derive Alice’s signature on a subset of m. 
Quoting typically applies to signed text messages where one wants to derive 
Alice’s signature on a substring of m. Quoting can also apply to signed images 
where one wants to derive a signature on a subregion of the image (say, a face 
or an object) and to data structures where one wants to derive a signature of 
a subset of the data structure such as a sub-tree of a tree. 

— Arithmetic: Given Alice’s signature on vectors 
V1,- --, Vg € Fp anyone should be able to derive Alice’s signature on a vector 
v in the linear span of v1,...,vz. Arithmetic on signed data is motivated 
by applications to secure network coding [25]. We show that these schemes 
can be used to compute authenticated linear operations such as computing 
an authenticated weighted sum of signed data and an authenticated Fourier 
transform. As a practical consequence of this, we show that an untrusted 
database storing signed data (e.g., employee salaries) can publish an authen- 
ticated average of the data without leaking any other information about the 
stored data. Recent constructions go beyond linear operations and support 
low degree polynomial computations [II]. 

— Transitivity: [41[35)6)31[7]43]49]40] Given Alice’s signature on edges in a 
graph G anyone should be able to derive Alice’s signature on a pair of vertices 
(u, v) if and only if there is a path in G from u to v. The derived signature on 
the pair (u,v) must be indistinguishable from a fresh signature on (u,v) had 
Alice generated one herself [35]. This requirement ensures that the derived 
signature on (u,v) reveals no information about the path from u to v used 
to derive the signature. 


In this paper, we put forth a general framework for computing on authenticated 
data that encompasses these lines of research and much more. While prior defini- 
tions mostly contained artifacts specific to the type of malleability they supported 
and, thus, were hard to compare to one another, we generalize and strengthen 
these disparate notions into a single definition. This definition can be instanti- 
ated with any predicate, and we allow repeated computation on the signatures 
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(e.g., it is possible to quote from a quoted signature.) During our study, we re- 
alized that the “privacy” notions offered by many existing definitions are, in our 
view, insufficient for some practical applications. We therefore require a stronger 
(and seemingly a significantly more challenging to achieve) property called context 
hiding. Under this definition, we provide two generic solutions for computing sig- 
natures on any univariate, closed predicate; however, these generic constructions 
are not efficient. We also present efficient constructions for three problems: quot- 
ing substrings, a subset predicate, and a weighted average over data (which cap- 
tures weighted sums and Fourier transforms). Our quoting substring construction 
is novel and significantly more efficient than the generic solutions. It is detailed in 
Section[4] For the problems of subsets and weighted averages, we show somewhat 
surprising connections to respective existing solutions in attribute-based encryp- 
tion and network coding signatures in Section 5] 


1.1 Overview 


A general framework. Let M be some message space and let 2™ be its powerset. 
Consider a predicate P : 2M x M — {0,1} mapping a set of messages and a 
message to a bit. Loosely speaking we say that a signature scheme supports 
computations with respect to P if the following holds: 


Let M C M be a set of messages and let m’ be a derived message, namely 
m satisfies P(M,m’) = 1. Then there exists an efficient procedure that 
can derive Alice’s signature on m’ from Alice’s independent signatures 
on all of the messages in M. 


For the quoting application, the predicate P is defined as P(M,m’) = 1 iff m’ 
is a quote from the set of messages M. Here we focus on quoting from a single 
message m so that P is false whenever M contains more than one peers 
and thus use the notation P(m,m’) as shorthand for P({m}, m’). The predicate 
P for arithmetic computations is defined in the full version [Í| and essentially 
says that P( (Vi; ---, Vk), V) is true whenever v is in the span of v1,..., Vk- 

We emphasize that signature derivation can be iterative. For example, given 
a message-signature pair (m, o) from Alice, Bob can publish a derived message- 
signature pair (m’,o’) for an m’ where P(m,m’) holds. Charlie, using (m’, o’), 
may further derive a signature o” on m”. In the quoting application, Charlie is 
quoting from a quote which is perfectly fine. 


Security. We give a clean security definition that captures two properties: un- 
forgeability and context hiding. We briefly discuss each in turn and give precise 
definitions in the next section. 


— Unforgeability captures the idea that an attacker may be given various de- 
rived signatures (perhaps iteratively derived) on messages of his choice. The 


1 We leave it for future work to construct systems for securely quoting from two 
messages (or possibly more) as defined next. 


4 J.H. Ahn et al. 


attacker should be unable to produce a signature on a message that is not 
derivable from the set of signed messages at his possession. E.g., suppose 
Alice generates (m,o) and gives it to Bob who then publishes a derived sig- 
nature (m’,o’). Then an attacker given (m’, 0’) should be unable to produce 
a signature on m or on any other message m” such that P(m’,m’”) = 0. 

— Context hiding captures an important privacy property: a signature should 
reveal nothing more than the message being signed. In particular, if a sig- 
nature on m’ was derived from a signature on m, an attacker should not 
learn anything about m other than what can be inferred from m’. This 
should be true even if the original signature on m is revealed. For example, 
a signed quote should not reveal anything about the message from which 
it was quoted, including its length, the position of the quote, whether its 
parent document is the same as another quote, whether it was derived from 
a given signed message or generated freshly, etc. 


Defining context hiding is an interesting and subtle task. In the next section, 
we give a definition that captures a very strong privacy requirement. We discuss 
earlier attempts at defining privacy following our definition in Section 2.3} while 
many prior works use a similar sounding intuition as we give above, most contain 
a fundamental difference to ours in their formalization. 

We note that notions such as group or ring signatures have 
considered the problem of hiding the identity of a signer among a set of users. 
Context hiding ensures privacy for the data rather than the signer. Our goal is 
to hide the legacy of how a signature was created. 


Efficiency. We require that the size of a signature, whether fresh or derived, 
depend only on the size of the object being signed. This rules out solutions 
where the signature grows with each derivation. 


Generic Approaches. We begin with two generic constructions that can be inef- 
ficient. They apply to closed, univariate predicates, namely predicates P(M, m’) 
where M contains a single message (P is false when |M| > 1) and where if 
P(a,b) = P(b,c) = 1 then P(a,c) = 1. The first construction uses any standard 
signature scheme S where the signing algorithm is deterministic. (One can en- 
force determinism using PRFs [28].) To sign a message m € M, one uses S to 
sign each message m’ such that P(m, m’) = 1. The signature consists of all these 
signature components. To verify a signature for m, one checks the signature com- 
ponent corresponding to the message m. To derive a signature m’ from m, one 
copies the signature components for all m” such that P(m’,m’’) = 1. Soundness 
of the construction follows from the security of the underlying standard scheme 
S and context hiding from the fact that signing in S is deterministic. 
Unfortunately, these signatures may become large consisting up to |M| sig- 
nature components — effecting both the signing time and signature size. Our 
second generic construction alleviates the space burden by using an RSA accu- 
mulator. The construction works in a similar brute force fashion where a sig- 
nature on m is an accumulator value on all m’ such that P(m,m/’) = 1. While 
this produces short signatures, the time component of both verification and 
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derivation are even worse than the first generic approach. Thus, these generic 
approaches are too expensive for most interesting predicates. We detail these 
generic approaches and proofs in the full version [I], where we also discuss a 
generic construction using NIZK. 


Our Quoting Construction. We turn to more efficient constructions. First, we 
set out to construct a signature for quoting substring, which although concep- 
tually simple is non-trivial to realize securely. As an efficiency baseline, we note 
that the brute force generic construction of the quoting predicate would result in 
n? components for a signature on n characters. So any interesting construction 
must perform more efficiently than this. We prove our construction selectively 
secure] In addition, we give some potential future directions for achieving adap- 
tive security and removing the use of random oracles. 

Our construction uses bilinear groups to link different signature components 
together securely, but in such a way that the context can be hidden by a re- 
randomizing step in the derivation algorithm. A signature in our system on a 
message of length n consists of nlgn group elements; intuitively organized as 
lgn group elements assigned to each character. To derive a new signature on a 
substring of £ characters, one roughly removes the group elements not associ- 
ated with the new substring and then re-randomizes the remaining part of the 
signature. This results in a new signature of lg £ group elements. The technical 
challenge consists in simultaneously allowing re-randomization and preserving 
the “linking” between successive characters. In addition, there is a second op- 
tion in our derive algorithm that allows for the derivation of a short signature 
of lg £ group elements; however the derive procedure cannot be applied again to 
this short signature. Thus, we support quoting from quotes, and also provide a 
compression option which produces a very short quote, but the price for this is 
that it cannot be quoted from further. 


Computing Signatures on Subsets and Weighted Averages. Our final two con- 
tributions are schemes for deriving signatures on subsets and weighted averages 
on signatures. Rather than create entirely new systems, we show connections to 
existing Attribute-Based Encryption schemes and Network Coding Signatures. 
We sketch those constructions in Section 5] and provide further details in [I]. 


Other Predicates. One can also imagine predicates P that support more complex 
operations on signed messages. One natural set of examples are spreadsheet 
operations such as median, standard deviation, and rounding on signed data 
(satisfying unforgeability and context hiding). Other examples include graph 
algorithms such as computing a signature on a perfect matching in a signed 
bipartite graph. 


2 A substring of £1 ...&n is some 7; ...xj where i,j € [i n] and i < 7. We emphasize 
that we are not considering subsequences. Thus, it is not possible, in this setting, to 
extract a signature on “I like fish” from one on “I do not like fish”. 

3 Following an analog of [20], selective security for signatures requires the attacker to 
give the forgery message before seeing the verification key. 
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2 Definitions 


Definition 1 (Derived messages). Let M be a message space and let P : 
2M x M — {0,1} be a predicate from sets over M and a message in M 
to a bit. We say that a message m’ is derivable from the set M C M if 
P(M,m’) = 1. We denote by P*(M) the set of messages derivable from M by 
repeated derivation. That is, let P?(M) be the set of messages derivable from M 
and for i > 0 let P’(M) be the set of messages derivable from P'~1(M). Then 
P*(M) := Ue PM): 

We define the closure of P, denoted P*, as the predicate defined by P*(M,m)= 
1 iffme P*(M). 


A P-homomorphic signature scheme J for message space M and predicate P is 
a triple of PPT algorithms: 


KeyGen(1*): The key generation algorithm outputs a key pair (pk, sk). We 
treat the secret key sk as a signature on the empty tuple € € M*. We also 
assume that pk is embedded in sk. 


SignDerive(pk, ({om}mem,™M),m’, w): The algorithm takes as input the public 
key, a set of messages M C M and corresponding signatures {om }mem, a derived 
message m’ € M, and possibly some auxiliary information w. It produces a 
new signature o’ or a special symbol | to represent failure. For complicated 
predicates P, the auxiliary information w serves as a witness that P(M,m’) = 1. 
To simplify the notation we often drop w as an explicit argument. 

As shorthand we write Sign(sk,m) := SignDerive(pk, (sk,e),m,-) to de- 
note that any message can be derived when the original signature is the signing 
key. For a set of messages M = {m1,..., Mk} C M* it is convenient to let 
Sign(sk, M) denote independently signing each of the k messages, namely: 


Sign(sk, M) := ( Sign(sk,m1),..., Sign(sk, mz) ) . 


Verify (pk, m, co): Given a public key, message, and purported signature g, the 
algorithm returns 1 if the signature is valid and 0 otherwise. 

We assume that testing m € M can be done efficiently, and that Verify 
returns 0 if m ¢ M. 


Correctness. We require that for all key pairs (sk, pk) generated by KeyGen(1”) 
and for all M € M* and m’ E€ M we have: 
— if P(M,m’) = 1 then SignDerive(pk, (Sign(sk, M), M), m’) Æ L, and 
— for all signature tuples {om}mear such that o’ +— SignDerive(pk, 
({om}mem,M),m') Æ L, we have Verify(pk,m’,o’) = 1. 


In particular, correctness implies that a signature generated by SignDerive can 
be used as an input to SignDerive so that signatures can be further derived 
from derived signatures, if allowed by P. 
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Derivation efficiency. In many cases it is desirable that the size of a derived sig- 
nature depend only on the size of the derived message. This rules out signatures 
that expand as one iteratively calls SignDerive. All the constructions in this 
paper are derivation efficient in this sense. 


Definition 2 (Derivation-Efficient). A signature scheme is derivation- 
efficient if there exists a polynomial p such that for all (pk, sk) ~ KeyGen(1>), 
set M C M*, signatures {om}mem + Sign(sk, M) and derived messages m’ 
where P(M,m') = 1, we have |SignDerive(pk, {om}mem,M,m’)| = p(à, |m'). 


2.1 Security: Unforgeability 


To define unforgeability, we extend the basic notion of existential unforgeability 
with respect to adaptive chosen-message attacks [29]. The definition captures 
the idea that if the attacker is given a set of signed messages (either primary 
or derived) then the only messages he can sign are derivations of the signed 
messages he was given. This is defined using a game between a challenger and 
an adversary A with respect to scheme I over message space M. 


— Game Unforg(J/, A,A, P): 
Setup: The challenger runs KeyGen(1*) to obtain (pk, sk) and sends pk to A. 
The challenger maintains two sets T and Q that are initially empty. 
Queries: Proceeding adaptively, the adversary issues the following queries to 
the challenger: 
— Sign(me WM): the challenger generates a unique handle h, runs Sign(sk, m) — 
c and places (h, m, a) into a table T. It returns the handle h to the adversary. 
— SignDerive(h = (hi,..., hx), m’): the oracle retrieves the tuples (hj, oi, mi) 
in T fori = 1,...,k, returning L if any of them do not exist. Let M := 
(Mmi,... Mk) and {om}mem ‘= {01,...,ok}. If P(M,m’) holds, then the 
oracle generates a new unique handle h’, runs SignDerive(pk, ({om}mem, 
M),m’') + o’ and places (h’,m’,o’) into T, and returns h’ to the adversary. 
— Reveal(h): Returns the signature o corresponding to handle h, and adds 
(a’,m’) to the set Q. 
Output: Eventually, the adversary outputs a pair (o’,m’). The output of the 
game is 1 (i.e., the adversary wins the game) if: 
— Verify(pk,m’,o’) = 1 and, 
— let M C M be the set of messages in Q then P*(M,m’) = 0 where P* 
is the closure of P from Definition [I] 
Else, the output of the game is 0. Define Forg, as the probability that 
Pr[Unforg (IZ, A, à, P) = 1]. 


Interestingly, for some predicates it may be difficult to test if the adversary won 
the game. For all the predicates we consider in this paper, this will be quite easy. 


Definition 3 (Unforgeability). A P-homomorphic signature scheme IT is 
unforgeable with respect to adaptive chosen-message attacks if for all PPT 
adversaries A, the function Forg, is negligible in A. 
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A P-homomorphic signature scheme IT is selective unforgeable with respect 
to adaptive chosen-message attacks if for all PPT adversaries A who begin the 
above game by announcing the message m’ on which they will forge, Forg, is 
negligible in À. 


Properties of the definition. By taking P to be the equality oracle, namely 
P(x,y) = 1 iff x = y, we obtain the standard unforgeability requirement for 
signatures. 

Notice that Sign and SignDerive queries return handles, but do not return the 
actual signatures. A system proven secure under this definition adequately rules 
out the following attack: suppose (m, ø) is a message signature pair and (m’, a’) 
is a message-signature pair derived from it, namely o’ = SignDerive(pk, co, 
m,m’). For example, suppose m’ is a quote from m. Then given (m’,a’) it 
should be difficult to produce a signature on m and indeed our definition treats 
a signature on m as a valid forgery. 

The unforgeability game imposes some constraints on P: (1) P must be 
reflexive, i.e. P(m,m) = 1 for all m € M, (2) P must be monotone, i.e. 
P(M,m’) = P(M', m’) where M C M’. It is easy to see that predicates that do 
not satisfy these requirements cannot be realized under Definition [B] 


2.2 Security: Context Hiding (a.k.a., Privacy) 


Let M be some set and let m’ be a derived message from M (i.e., P(M,m’) = 1). 
Context hiding captures the idea that a signature on m’ derived from signatures 
on M should reveal no information about M beyond what is revealed by m’. For 
example, in the case of quoting, a signature on a quote from m should reveal 
nothing more about m: not the length of m, not the position of the quote in m, 
etc. The same should hold even if the attacker is given signatures on multiple 
quotes from m. 

We put forth the following powerful statistical definition of context hiding 
and discuss its implications following the definition. We were most easily able to 
leverage a statistical definition for our proofs, although we also give an alternative 
computational definition in the full version [i]. 


Definition 4 (Strong Context Hiding). Let M C M* and m’ € M be 
messages such that P(M,m’) = 1. Let (pk, sk) + KeyGen(1>) be a key pair. 
A signature scheme (KeyGen, SignDerive, Verify) is strongly contest hid- 
ing (for predicate P) if for all such triples ((pk, sk),M,m’), the following two 
distributions are statistically close: 


{ (sk, {Om}mem — Sign(sk, M), Sign(sk, m’')) F M.m’ 
{ (sk, {Om}mem = Sign(sk, M), SignDerive(pk, {om}mem, M), m’')) Jek Mm’ 


The distributions are taken over the coins of Sign and SignDerive. Without 
loss of generality, we assume that pk can be computed from sk. 
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The definition states that a derived signature on m’, from an honestly-generated 
original signature, is statistically indistinguishable from a fresh signature on 
m’. This implies that a derived signature on m’ is indistinguishable from a 
signature generated independently of M. Therefore, the derived signature cannot 
(provably) reveal any information about M beyond what is revealed by m’. By 
a simple hybrid argument the same holds even if the adversary is given multiple 
derived signatures from M. 

Moreover, Definition [4]requires that a derived signature look like a fresh signa- 
ture even if the original signature on M is known. Hence, if for example someone 
quotes from a signed recommendation letter and somehow the original signed 
recommendation letter becomes public, it would be impossible to link the signed 
quote to the original signed letter. The same holds even if the signing key sk is 
leaked. 

Thus, Definition M captures a broad range of privacy requirements for derived 
signatures. Earlier work in this area [32/16/18[15] only considered weaker pri- 
vacy requirements using more complex definitions. The simplicity and breadth 
of Definition Mis one of our key contributions. 

Definition [4] uses statistical indistinguishability meaning that even an un- 
bounded adversary cannot distinguish derived signatures from newly created 
ones. In the full version [I], we give a definition using computational indistin- 
guishability which is considerably more complex since the adversary needs to be 
given signing oracles. In the unbounded case of Definition [4] the adversary can 
simply recover a secret key sk from the public key and answer its own signature 
queries which greatly simplifies the definition of context hiding. All the signature 
schemes in this paper satisfy the statistical Definition 4] 

As mentioned above, the context-hiding guarantee applies to all derivations 
that begin with an honestly-generated signature. One might imagine a scenario 
where a malicious signer creates a signature that passes the verification algo- 
rithm, but contains a “watermark” that allows the signer to detect if other 
signatures are derived from it. To prevent such attacks from malicious signers, 
we could alter the definition so that indistinguishability holds for any derivative 
that results from a signature that passed the verification algorithm. 


A simpler approach to proving unforgeability. For systems that are strongly con- 
text hiding, unforgeability follows from a simpler game than that of Section 
In particular, it suffices to just give the adversary the ability to obtain top level 
signatures signed by sk. In the full version [I], we define this simpler unforge- 
ability game and prove equivalence to Definition B] using strong context hiding. 


2.3 Related Work 


Early work on quotable signatures [45)32/38)37/300 7121/15] supports quoting 


from a single document, but does not achieve the privacy or unforgeability prop- 
erties we are aiming for. For example, if simple quoting of messages is all that is 
desired, then the following folklore solution would suffice: simply sign the Merkle 
hash of a document. A quote represents some sub-tree of the Merkle hash; so 
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a quoter could include enough intermediate hash nodes along with the original 
signature in any quote. A verifier could simply hash the quote, and then build 
the Merkle hash tree using the computed hash and the intermediate hashes, and 
compare with the original signature. Notice, however, that every quote in this 
scheme reveals information about the original source document. In particular, 
each quote reveals information about where in the document it appears. Thus, 
this simple quoting scheme is not context hiding in our sense. 

The work whose definition is closest to what we envision is the recent work 
on redacted signatures of Chang et al. [21] and Brzuska et al. [I5] (see also Nac- 
cache [39] p. 63] and Boneh-Freeman (mf). However, there is a subtle, but 
fundamental difference between their definition and the privacy notion we are 
aiming for. In our formulation, a quoted signature should be indistinguishable 
from a fresh signature, even when the distinguisher is given the original signa- 
ture. (We capture this by an even stronger game where a derived signature is 
distributed statistically close to a fresh signature.) In contrast, the definitions 
of 2if15]1 2717] do not provide the distinguisher with the original signature. Thus, 
it may be possible to link a quoted document to its original source (and indeed 
it is in the constructions of IJT5/12/11]), which can have negative privacy im- 
plications. Overcoming such document linkage while maintaining unforgeability 
is a real technical challenge. This requires moving beyond techniques that use 
nonces to link parts of messages. 

Indeed, in most prior constructions, such as {21]15], nonces are used to prevent 
“mix-and-match” attacks (e.g., forming a “quote” using pieces of two different 
messages.) Unfortunately, these nonces reveal the history of derivation, since they 
cannot change during each derivation operation. Arguably, much of the technical 
difficulty in our current work comes precisely from the effort to meet our definition 
and hide the lineage. We introduce new techniques in this work which link pieces 
together using randomness that can be re-randomized in controlled ways. 

Another line of work studies computing on authenticated data by holders of 
secret information. Examples include sanitizable signatures that 
allow a proxy to compute signatures on related messages, but requires the proxy 
to have a secret key, and incremental signatures [4], where the signer can effi- 
ciently make small edits to his signed data. In contrast, our proposal is more 
along the lines of homomorphic encryption and Rivest’s vision [41], where anyone 
can compute on the authenticated data. 


“ As acknowledged in Section 2.2 of Boneh-Freeman [II], our definitional notion is 
stronger than and predates the “weak context hiding” notion of [II]. Indeed, the fact 
that uses our framework lends support to its generality, and the fact that they 
could not achieve our context hiding notion highlights its difficulty. Their “weak” 
definition, which is equivalent to (15|, only ensures privacy when the original sig- 
natures remain hidden. In their system, signature derivation is deterministic and 
therefore once the original signatures become public it is easy to tell where the 
derived signature came from. Our signatures achieve full context hiding so that 
derived signatures remain private no matter what information is revealed. This is 
considerably harder and is not known how to do for the lattice-based signatures in 
Boneh-Freeman. 
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Bilinear Groups and the CDH Assumption. Let G and Gr be groups of prime 
order p. A bilinear map is an efficient mapping e : G x G > Gr which is both: 
(bilinear) for all g € G and a,b + Zp, e(g%, g?) = e(g, g)*; and (non-degenerate) 
if g generates G, then e(g,g) 4 1. We will focus on the Computational Diffie- 
Hellman assumption in these groups. 


Assumption 1 (CDH [24]). Let g generate a group G of prime order p € 
O(2ò). For all PPT adversaries A, the following probability is negligible in A: 
Pra, b, — Zp; z — A(g,9%,9°) : z = 9"). 


4 A Powers-of-2 Construction for Quoting Substrings 


We now provide our main construction for quoting substrings in a text document. 
It achieves the best time/space efficiency trade-off to our knowledge for this 
problem. We will have two different types of signatures called Type I and Type 
II, where a Type I signature can be quoted down to another Type I or Type 
II signature. A Type II signature cannot be quoted any further, but will be a 
shorter signature. The quoting algorithm will allow us to quote anything that is 
a substring of the original message. We point out that the Type I, II signatures 
of this system conform to the general framework given in Section P] In particular, 
we can view a message M as a pair (t,m) € {0,1}, {0,1}*. The bit t will identify 
the message as being Type I or Type II (assume t = 1 signifies Type I signatures) 
and m will be the quoted substring. The predicate 


P(M = (t,m), M’ = (t',m’)) = 
( ( a ( m)) 0 otherwise. 


t if t = 1 and m’ is a substring of m; 
The bit ¢’ will indicate whether the new message is Type I or II (i.e., whether 
the system can quote further.) We note that this description allows an attacker 
to distinguish between any Type I signature from any Type II signature since 
the “type bit” of the messages will be different and thus they will technically 
be two different messages even if the substring components are equal. For this 
reason we will only need to prove context hiding between messages of Type I 
or Type II, but not across types. In general, flipping the bit t will not result 
in a valid signature of a different type on the same core message, because the 
format will be wrong; however, moving from a Type I to a Type II on the same 
core message is not considered a forgery since Type II signatures can be legally 
derived from Type I. 

For presentational clarity, we will split the description of our quoting algo- 
rithm into two quoting algorithms for quoting to Type I and to Type II sig- 
natures; likewise we will split the description of our verification algorithm into 
two separate verification algorithms, one for each type of signature. The type of 
signature used or created (i.e., bit t) will be implicit in the description. 
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Fig. 1. The top diagram represents a signature on “abcdefghijklmn” with length N = 
14. Each arrow corresponds to some group elements in the construction. Logically, 
whenever the elements corresponding to an arrow are included in a quoted signature, 
the characters underneath this arrow are included in the quoted message. The bold 
path through the top diagram shows how to construct a Type II signature on “defgh”; 
it is very short, but cannot be re-quoted. The gray box in this figure shows how to 
construct a Type I signature on “cdefghi” of length £ = 7; it includes all the arrows in 
the lower figure and can be re-quoted. A technical challenge is to enforce that following 
the arrows is the only way to form a valid signature. Details are below. 


Notation: We use notation m; j to denote the substring of m of length j starting 
at position 7. 


Intuition: We begin by giving some intuition. We design Type I signatures that 
allow re-quoting and Type II signatures that cannot be further quoted, but are 
ultra-short. For an original message of length n, our signature structure should 
be able to accommodate starting at_any position 1 <i < n and quoting any 
length 1 < £< (n—i+ 1) substring [| 

To (roughly) see how this works for a message of length n, visualize (n + 1) 
columns with (|lgn| +2) rows as in Figure [I] The columns correspond to the 
characters of the message, so if the 14-character message is “abcdefghijklmn” 
then there are 15 columns, with a character in between each column. The rows 


5 Technically, our predicate P(m,m’) will take the quote from the first occurrence 
of substring m’ in m, but for the moment imagine that we allowed quoting from 
anywhere in m. 
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correspond to the numbers lgn down to 0, plus an extra row at the bottom [f] 
Each location in the matrix (except along the bottom-most row) contains one 
or more out-going arrows. We’ll establish rules for when these arrows exist and 
where each arrow ends shortly. 

A Type II quote will trace a (lgn+1)-length path on these arrows through this 
matrix starting in a row (with outgoing arrows) of the column that begins the 
quote and ending in the lowest row of the first column after the quote ends. The 
starting row corresponds to the largest power of two less than or equal to the 
length of the desired quote. E.g., to quote “bcedef”’, start in row 2 immediately 
to the left of ‘b’ (because 2? = 4 is the largest power of two less than 5) and 
end in row 0 immediately to the right of ’f’. Intuitively, taking an arrow over a 
character includes it in the quote. A Type II quote on “defgh” is illustrated in 
Figure [I] 

A technical challenge is to make this a O(lg n)-length path rather than a O(n)- 
length path. To do this, the key insight is to view the length of any possible quote 
as the sum of powers of two and to allow arrows that correspond to covering the 
quote in pieces of size corresponding to one operand of the sum at a time. Each 
location (ie, ir) in the matrix (except the bottom-most row) contains: 


— a “start” arrow: an arrow that goes down one row and over 2r columns 
ending in (i, + 2'",i, — 1), if this end point is in the matrix. This adds all 
characters from position ie to i, + 2'" — 1 to the quoted substring; effectively 
adding the largest power-of-two-length prefix of the quote characters. This 
arrow indicates that the quote starts here. These are represented as Si j, Sij 
pairs in our construction. 

— a “one” arrow: operate similarly to start arrows and used to include charac- 
ters after a start arrow includes the quote prefix. These are represented as 
Ai j, Aij pairs in our construction. 

— a “zero” arrow: an arrow that goes straight down one row ending in (ic, ir — 
1). This does not add any characters to the quoted substring. These are 


represented as D; j, D;,; pairs in our construction. 


A Type II quote always starts with a start arrow and then contains one and zero 
arrows according to the binary representation of the length of the quote. In our 
example of original message “abcdefghijklmn”, we have 15 columns and 5 rows. 
We will logically divide our desired substring of “bedef” (length 5 = 2? + 2° = 
4 +1) into its powers-of-two components “bede” (length 4 = 27) and “P (length 
1 = 2°). To form the Type II quote, we start in row 2 (since 4 = 2?) of column 
2 (to the left of ’b’) and take the start arrow (S22) to row 1 of column 7, take 
the zero arrow (D7,,) to row 0 of column 7, and then take the one arrow (A7,0) 
to the lowest row of column 8. The arrows “pass over” the characters “bcdef”. 
Figure [IJ illustrates this for quote “defgh”. 

For a quote of length £Z, the elements on this O(lg @)-length path of arrows 
form a very short Type II signature. For Type I signatures, we include all the 


ê The lowest row is intentionally not assigned a number. The second lowest row is row 
0. We do this so that row 7 can correspond to a jump of length 2°. 


14 J.H. Ahn et al. 


elements corresponding to all arrows that make connections within the columns 
corresponding to the quote. We illustrate this in Figure [I] This allows quoting 
of quotes with a signature size of O(£1g £). 

It is essential for security that the signature structure and data algorithm 
enforce that the quoting algorithm be used and not allow an attacker to “splice” 
together a quote from different parts of the signature. We realize this by adding 
in random “chaining” variables. In order to cancel these out and get a well 
formed Type II quote a user must intuitively follow the prescribed procedure 
(i.e., following the arrows is the only way to form a valid quote.) 


The Construction: We now describe our algorithms. While Sign is simply 
a special case of the SignDerive algorithm, we will explicitly provide both 
algorithms here for clarity purposes. 


KeyGen(1*) : The algorithm selects a bilinear group G of prime order p > 2% 
with generator g. Let L be the maximum message length supported and 
denote n = |lg(L)|. Let H : {0,1}* > G and H, : {0,1}* — G be the 
description of two hash functions that we model as random oracles. Choose 
random 20,.-.,2n—1,@ E€ Zp. The secret key is (z,...,2n—1,@) and the 
public key is: 


PK = (Eda pace Q 2B)" hh 


Sign(sk, M = (t,m) € {0,1} x S“S”) : If t = 1, signatures produced by this 
algorithm are Type I as described below. If t = 0, the Type II signature can 
be obtained by running this algorithm and then running the Quote-Type II 
algorithm below to obtain a quote on the entire message. The message space 
is treated as £ < L symbols from alphabet X. 

Recall: we use notation m; j to denote the substring of m of length j starting 
at position 2. 

For i = 3 to £+1 and j = 0 to |lg(i—1)—1], choose random values z; j € Zp. 
These will serve as our random “chaining” variables, and they should all 
“cancel” each other out in our short Type II signatures. By definition, set 
v;,-1 :=0 for alli = 1 to 2+1. 


A signature is comprised of the following values for i = 1 to £ and j = 0 to 
|lg(£— i + 1)], for randomly chosen values r; j € Zp: 


[start arrow: start and include power J] 


æa — r. a Pe a Tis 
Sig = Gg “4295-1 He (mj,95)"4 , Sig = g 


Together with the following values for i = 3 to £ and j = 0 to min({lg(¢ — 
1) — 1], [Ig(é—%+1)]), for randomly chosen values r; ; € Zp: 


[one arrow: include power j and decrease j] 


et SC l S a 
Aig = gi g Tia aH (mia) , Aig = g 
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Together with the following values for i = 3 to £+1 and j = 0 to |lg(i—1)—1], 
for randomly chosen values rj; € Zp: 


[zero arrow: decrease j] 


pl fe 
giii sihi = ghi 


Dij =: gi gahi] 


We provide an example of how to form Type II signatures from this con- 
struction shortly. To see why our A; j and D; j values start at i = 3, note 
that Type II quotes at position i of length 2° = 1 symbol include only the 
Si o value, where the x. o—1 term is 0 by definition. Type II quotes at position 
i of length 2! = 2 symbols include the S;,ı value plus an additional Dj+2,5 
term to cancel out the xi+2, value (leaving only x;42,-1 = 0.) Quotes at 
position i of length 2! + 1 = 3 symbols include the $; value plus an addi- 
tional A;+2,9 term to cancel out the xi+2, value (leaving only 7;43,-1 = 0.) 
Since we index strings from position 1, the first position to include an A; j 
or D; j value is i+ 2 = 3. 

SignDerive(pk,o, M = (t,m), M' = (t',m’)) : If P(M, M’)=0, output L. Oth- 
erwise, if t’ = 1, output Quote-Type I(PK, o, m, m’); if t = 0, output Quote- 
Type II(PK,o,m,m’), where these algorithms are defined below. 

Quote-Type I(pk,a,m,m’) : The quote algorithm takes a Type I signature 

and produces another Type I signature that maintains the ability to be 
quoted again. Intuitively, this operation will simply find a substring m’ in m, 
keep only the components associated with this substring and re-randomize 
them all (both the x; j and r;,; terms in every component.) 
If m’ is not a substring of m, then output L. Otherwise, let ¢’ = |m 
Determine the first index k at which substring m’ occurs in m. Parse o as a 
collection of Si j, Si j, Aij, Aij, Dij, Di j values, exactly as would come from 
Sign with ¢ = |m]. 


'| 


First, we choose re-randomization values (to re-randomize the x; j terms of 
o.) For i = 2 to l +1 and j = 0 to |lg(i — 1) — 1], choose random values 
Yij E Zp. Set ys,-1 := 0 for alli = 1 to & +1. Later, we will choose t; j 
values to re-randomize the r; j terms of ø. 


The quote signature g’ is comprised of the following values: 


For i= 1 to @ and j = 0 to [Ig(¢’/ — i + 1)], for randomly chosen t; j € Zp: 


AE e T 4. SG Pes, A F tig 
Sij = Site—ig J "+ 9-1 As (Mig e123), Sij = Sitk- j g 


Together with the following values for i = 3 to é’ and j = 0 to min(|lg(i — 
1) — 1J, [lg(@’ —i+1)]), for randomly chosen t; ; € Zp: 


po iia Yii- Aine E A. oe eh 
Aij = Aiseig gig 427.51 A (mis p—199) 9, Als = Aite—1j gi 


Together with the following values for i = 3 to V +1 and j = 0 to |lg(i — 
1) — 1], for randomly chosen t}; € Zp: 


$i 


1 ijn Yij-1 n2t Pr ty 
Dig = Ditn-15 gig "igi, Di j = Dite-1,5 i 9 
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Quote-Type II(pk,o,m,m’) : The quote algorithm takes a Type I signature 
and produces a Type II signature. If P(m,m’) £1, then output L. 

A quote is computed from one start value and logarithmically many sub- 

sequent pieces depending on the bits of |m’|. All signature pieces must be 

re-randomized to prevent content-hiding attacks. 

Consider the length @’ written as a binary string. Let 8’ be the largest index of 

l = |m’| that is set to 1, where we start counting with zero as the least signifi- 

cant bit. That is, set 6’ = |lg(€’) |. Select random values v, vg/—1,..., v0 € Zp. 

Set the start position as B := Sk g’ and 

ki := k + 2°. Then, from j = 6’ — 1 down to 0, proceed as follows: l 
— If the jth bit of @ is 1, set B := B- Aw j: H(my oi)”, set k’ := k' +2, 

and Zj := Ages g's; 

— If the jth bit of @ is 0, set B := B+ Dy j: g7” and Zj := Dy jg”. 
To end, re-randomize as B := B- H: (Mp 28)” and S:= Sk,g : g”; output the 
quote as 7 

a’ = (B, S, Zg6-1, sees Zo) 

Verify(pk, M = (t,m),o) : Ift = 1, output Verify-Type I(pk,m, co). Otherwise, 
output Verify-Type II(pk,m,o), where these algorithms are defined imme- 
diately below. eS T oN 

Verify—Type I(pk,m,o) : Parse o as the set of S; j, Sij, Aij, Aij, Dij, Dij- 

Let £= |m]. 
Let X;,; denote e(g, g)”*7. We can compute these values as follows. The value 
X; —1 = 1, since for alli = 1 to +1, z; —1ı = 0. For i = 3 to €+1 and j = 0 to 
[lg(t—1)—1], we compute X;,; in the following manner: Let I = i—2/+1 and 
J = j +1. Next, compute X;,; = (elg, g)“ . e(H(mz,22), Sr,7)) / e(S7,7,9). 
The verification accepts if and only if all of the following hold: 

— for i = 3 to £ and j = 0 to min(|lg(i — 1) — 1], |lg(£ — i + 1)J), 

e(Aij, 9) = Xij/ Xiz j-1 ` e(H (miz), Aig) 

— and for i = 3 to +1 and j = Q0 to |lg(i—1)—1] 5 e(Dij,9) = Xij/Xij-1- 

e(g*4 A Dij): 

Verify-Type II(pk,m,o) : We give the verification algorithm for Type II sig- 
natures. Parse o as (B, 5S, Z6-1,---, Zo). Let £= |m] and 8 be the index of 
the highest bit of £ that is set to 1. If ø does not include exactly 2 Z; values, 
reject. Set C := 1 and k = 1. From j = 8 — 1 down to 0, proceed as follows: 

— If the jth bit of £ is 1, set C := C- e(H (mp, 2), Zj) and k :=k +2 ; 

— Ifthe jth bit of £ is 0, set C := C- e(g”, Z4). E 
Accept if and only if e(B, g) = e(g, g)“ - e(Hs(™m 22), S) - C. 


Theorem 2 (Security under CDH). If the CDH assumption holds in G, 
then the above quotable signature scheme is selectively quote unforgeable and 
context-hiding in the random oracle model. 


In the full version [i], we prove this theorem. We also discuss in detail the 
efficiency of this construction, how to remove the random oracle, and how to 
obtain full security. 
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5 Subsets and Weighted Averages 


For the problems of subsets and weighted averages, we show somewhat surpris- 
ing connections to respective existing solutions in attribute-based encryption 
and network coding signatures. We sketch these constructions here and provide 
further details in the full version of this paper [I]. 

Briefly, our subset construction extends the concept of Naor [10] who observed 
that every IBE scheme can be transformed into a standard signature scheme by 
applying the IBE KeyGen algorithm as a signing algorithm. Here we show an 
analog for known Ciphertext-Policy (CP) ABE schemes. The KeyGen algorithm 
which generates a key for a set S of attributes can be used as a signing algorithm 
for the set S. For known CP-ABE systems it is straightforward to derive 
a key for a subset S’ of S and to re-randomize the signature/key. To verify a 
signature on S we can apply Naor’s signature-from-IBE idea and encrypt a 
random message X to a policy that is an AND of all the attributes in S and 
see if the signature can be used as an ABE key to decrypt to X. Signatures for 
subsets have been previously considered in 86.4], but without context hiding 
requirements. 

Next, we consider a construction for weighted averages, which captures Fourier 
transforms and weighted sums. This is particularly interesting, because so far 
we only constructed schemes for univariate predicates P. We can now give an 
example where one computes on multiple signed messages. Let p be a prime, n 
a positive integer, and 7 a set of tags. The message space M consists of pairs: 


M:=T xF; 


Now, define the predicate P as follows: P(e, m) = 1 for all m € M and] 


P( ( (haves (tise) ) (nv) )=1 pane eae and 


v € span(vi,..., Vk) 


Thus, given signatures on vectors v1,...,V% grouped together by the tag t, 
anyone can create a signature on a linear combination of these vectors. This can 
be done iteratively so that given signed linear combinations, new signed linear 
combinations can be created. Unforgeability means that if the adversary obtains 
signatures on vectors v,,...,V% for particular tag t € T then he cannot create 
a signature on a vector outside the linear span of vj,..., Vx. 

Signature schemes for this predicate P are presented in [[3[12/11[14]3] while 
schemes over Z (rather than F,,) are presented in [26]. These schemes were origi- 
nally designed to secure network coding where context hiding is not needed since 
there are no privacy requirements for the sender (in fact, the sender is explicitly 
transmitting all his data to the recipient). The question then is how to construct 
a system for predicate P above that is both unforgeable and context hiding. 
Fortunately, we observe that under the CDH assumption, the linearly homomor- 
phic signature scheme, NCS, due to Boneh, Freeman, Katz and Waters [13] 


T Recall, the signature on € is the output the KeyGen algorithm. 
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is unforgeable and context-hiding in the random oracle model, assuming tags 
are generated independently at random by the unforgeability challenger when 
responding to Sign queries. 


Acknowledgments. We are grateful to the anonymous reviewers for their help- 
ful comments. 
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Abstract. Motivated by problems in secure multiparty computation 
(MPC), we study a natural extension of identifiable secret sharing to 
the case where an arbitrary number of players may be corrupted. An 
identifiable secret sharing scheme is a secret sharing scheme in which the 
reconstruction algorithm, after receiving shares from all players, either 
outputs the correct secret or publicly identifies the set of all cheaters 
(players who modified their original shares) with overwhelming success 
probability. This property is impossible to achieve without an honest ma- 
jority. Instead, we settle for having the reconstruction algorithm inform 
each honest player of the correct set of cheaters. We show that this new 
notion of secret sharing can be unconditionally realized in the presence 
of arbitrarily many corrupted players. We demonstrate the usefulness 
of this primitive by presenting several applications to MPC without an 
honest majority. 


— Complete primitives for MPC. We present the first unconditional con- 
struction of a complete primitive for fully secure function evaluation 
whose complexity does not grow with the complexity of the function 
being evaluated. This can be used for realizing fully secure MPC 
using small and stateless tamper-proof hardware. A previous com- 
pleteness result of Gordon et al. (TCC 2010) required the use of 
cryptographic signatures. 

— Applications to partial fairness. We eliminate the use of cryptogra- 
phy from the online phase of recent protocols for multiparty coin- 
flipping and MPC with partial fairness (Beimel et al., Crypto 2010 
and Crypto 2011). This is a corollary of a more general technique 
for unconditionally upgrading security against fail-stop adversaries 
with preprocessing to security against malicious adversaries. 
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Finally, we complement our positive results by a negative result on iden- 
tifying cheaters in unconditionally secure MPC. It is known that MPC 
without an honest majority can be realized unconditionally in the OT- 
hybrid model, provided that one settles for “security with abort” (Kilian, 
1988). That is, the adversary can decide whether to abort the protocol 
after learning the outputs of corrupted players. We show that such pro- 
tocols cannot be strengthened so that all honest players agree on the 
identity of a corrupted player in the event that the protocol aborts, even 
if a broadcast primitive can be used. This is contrasted with the compu- 
tational setting, in which this stronger notion of security can be realized 
under standard cryptographic assumptions (Goldreich et al., 1987). 


1 Introduction 


Consider a scenario in which n mutually distrustful clients wish to distribute 
a long computation. Instead of directly interacting with each other, they rely 
on a trusted external stateless server. In each invocation, the server receives a 
share of the current state of computation (and possibly an additional input) 
from each client, and returns a share of the new state (and possibly an output) 
to each client. This scenario may apply to distributing sensitive computations 
using servers in the cloud, where requiring servers to maintain state information 
between different invocations is undesirable for security reasons. 

The question we ask is what form of secret sharing is suitable for distributing 
the joint state between the clients. Naturally, we do not want to assume that 
a majority of the clients are honest (this rules out fair [8] or unconditionally 
secure [6] solutions that use direct interaction between the clients and do not 
employ the server). Additively sharing the state fails in protecting the correctness 
of the computation, allowing each client to change the global state without being 
detected. A better solution is to use robust secret sharing that can detect cheating 
(cf. [299] and references therein). When there are three or more clients, this too 
has the disadvantage that it offers no strong deterrent against cheating: while 
cheating does not go undetected, it disrupts the computation without identifying 
a corrupted client. This motivates the use of identifiable secret sharing, where 
a failure of the reconstruction algorithm results in identifying the clients who 
modified their shares. 

Identifiable secret sharing as above can be realized when a majority of the 
clients are honest [22/20[7/24]. But without an honest majority, there is no way 
for the server to tell apart cheaters from honest clients. Indeed, n/2 cheaters can 
simulate a consistent sharing of an incorrect secret, which makes it impossible for 
the server to tell which of the two sets of consistent shares is correct. However, 
this does not rule out the alternative of allowing the server to inform each client 
(with negligible error probability) which shares have been modified assuming 
that this client is honest. We refer to this as locally-identifiable secret sharing 
(LISS). Note that except with negligible probability, each honest client will agree 
on which clients are corrupted and should be disqualified. Thus use of LISS to 
share the state minimizes the incentive to cheat and allows the honest clients 
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in the event of reconstruction failure to agree on a strict subset of the clients 
that includes all honest clients. This subset has the option of restarting the 
computation on their original inputs, using default values for the inputs of the 
remaining clients, without losing in this process any of the honest clients. 

Settling for computational security, LISS can be realized via the use of dig- 
ital signatures: the sharing procedure distributes to all clients the same public 
verification key vk, and gives to each client a signature of its additive share of 
the secret using the corresponding secret key sk. Reconstruction proceeds by 
letting each client send to the server its original share, vk, and the signature on 
the share. The server can then identify definite cheaters as those who supply an 
inconsistent triplet, and partition the remaining clients according to the value 
of vk they provide. In fact, such a computationally secure LISS scheme was im- 
plicitly used by Gordon et al. in the context of defining a complete primitive 
for MPC. The possibility of an unconditionally secure construction remained 
open. This question is motivated not only by the goals of enhancing security 
and eliminating assumptions, but also by the potential efficiency advantages 
of information-theoretic techniques. This is especially significant in applications 
(such as those discussed below) where the share generation process is distributed 
between multiple players. 


1.1 Our Results 


Constructions. Our main result is an affirmative answer to the above question: 
we present an unconditional construction of an n-out-of-n LISS scheme whose 
security holds in the presence of an arbitrary number of corrupted players. More 
generally, we show how to efficiently transform any secret sharing scheme into 
one in which the reconstruction function reveals to every honest player of the 
identity of all shares that have been tampered with. In particular, all honest 
players agree on the same set of cheaters. 

We also consider a weaker variant of LISS that we call unanimously identifiable 
secret sharing (UISS) in which only the latter agreement property is required. 
That is, if reconstruction fails, all honest players should agree on the same (non- 
empty) set of cheaters. This weaker primitive is easier to construct. (In fact, 
a construction of UISS is implicit in [25].) In contrast to LISS, however, UISS 
does not guarantee that all cheaters are detected in the event that reconstruction 
fails. 


Applications. We present several applications of the above primitives in the 
context of MPC without an honest majority. In the following, the term MPC 
refers to the special case of secure function evaluation, namely MPC of non- 
reactive (stateless) functionalities. We use poly and neg to represent polynomial 
and negligible functions, respectively, and « denote a statistical security param- 
eter. While we mainly consider statistical security, our results are also useful in 
the domain of computational security. 


COMPLETE PRIMITIVES FOR MPC. It is well known that fully secure MPC 
(with fairness and guaranteed output delivery) is impossible to achieve in general 
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without an honest majority [8]. This naturally raises the question of finding a 
minimal complete primitive that can be used to get around this limitation. Such 
a primitive is defined by a (stateless) deterministic functionality g mapping n 
inputs to n outputs, such that any n-party functionality f can be realized using 
a trusted instance of g initialized between every tuple of players that can supply 
input to it. The first such results characterized complete boolean primitives for 
MPC with security against a passive adversary [21J19]. In the case of active 
adversaries, Fitzi et al. presented a complete primitive for fully secure MPC 
whose computational complexity grows linearly with complexity of f. This left 
open the question of finding a “simple” complete primitive, whose complexity 
does not depend on the complexity of f. One such primitive was given by Gordon 
et al. [I3] using digital signatures. We use UISS to get an unconditional variant 
of this result. In this variant, the complexity of g only grows with the output 
length of f. 


Theorem 1. There is a deterministic, polynomial-time computable functional- 
ity g with input and output size poly(n, K, 8) such that any n-party function f 
computed by a circuit of size o and output length B can be realized with full 
statistical security (and 27" simulation error) using poly(n,c) calls to g. 


This result has an interesting interpretation in the context of a recent line of 
work on basing cryptography on tamper-proof hardware (see and refer- 
ences therein). In this line of work, several impossibility results in cryptography 
(including UC security, unconditional security, software protection and obfusca- 
tion) were circumvented by using tamper-proof hardware tokens. These works 
spent efforts on minimizing the size of the tokens, employing stateless (rather 
than stateful) tokens, and minimizing or eliminating cryptographic assumptions. 
The above result can be viewed as achieving all these goals simultaneously in the 
context of another major impossibility result: the impossibility of fully secure 
MPC without an honest majority. It implies that a small and stateless token, 
connected via secure channels to the n players, suffices to unconditionally real- 
ize fully secure MPC. We note that connecting the same token to all players is 
necessary, as implied by the results of Fitzi et al. [Ti]. 

We also present other variants of the previous completeness theorem which 
rely on computational assumptions but still avoid the use of cryptography in- 
side the primitive. These variants have the advantage of requiring only a small 
number of calls to the primitive (independently of the complexity of f). 


APPLICATIONS TO PARTIAL FAIRNESS. A recent line of works studies the extent 
to which partial fairness can be achieved in MPC without an honest majority. 
Partial fairness can be defined by restricting the simulation error to be small 
(e.g., inverse polynomial) but not negligible [14]. We show that in partially fair 
protocols of Beimel et al. [2] (extending previous two-party protocols of Moran 
et al. and Gordon and Katz {14]), the use of a digital signature scheme can be 
replaced by a unanimously identifiable commitment scheme, a second primitive 
we define that can be used as a substitute for LISS in certain applications. This 
yields unconditional multiparty protocols for coin-flipping and MPC with partial 
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fairness in the preprocessing model, namely assuming that players have offline 
access to correlated randomness. We note that trusted preprocessing does not 
trivialize the problem, because the output needs to be unpredictable in the end 
of the preprocessing phase. In fact, the negative results on achieving full fairness 
apply to the preprocessing model as well. The preprocessing model does allow, 
however, to eliminate the assumptions of secure channels and broadcast, which 
can be implemented unconditionally in the preprocessing model [27]. 

The preprocessing phase can be realized either by a trusted offline dealer or via 
a distributed protocol (possibly employing additional parties for unconditional 
security). Even if one relies on a computationally secure protocol for distributing 
the preprocessing phase, the protocols we get have the advantage of making only 
a black-box use of the underlying cryptographic primitives, whereas the original 
protocols from make a non-black-box use of a one-way function. 

In the case of coin-flipping, applying our primitive to the offline dealer protocol 
from [2] implies the following: 


Theorem 2. Assume preprocessing by a trusted off-line dealer. Fix constants n 
and t such that t < 2n/3. Then, for any r, there is an r-round n-party uncon- 
ditionally secure coin-tossing protocol over point-to-point channels tolerating up 
to t malicious players with bias O(1/r). 


Our results on MPC with partial fairness are obtained via a general technique 
for unconditionally upgrading security against fail-stop adversaries to security 
against malicious adversaries where the messages sent by the players are deter- 
mined in the preprocessing stage. 


A Negative Result. It is known that MPC without an honest majority can be 
realized unconditionally in the OT-hybrid model, provided that one settles for 
“security with abort” [18/16]. That is, the adversary can decide whether to abort 
the protocol after learning the outputs of corrupted players but before the honest 
players receive their output. We show that such protocols cannot be strengthened 
so that all honest players agree on the identity of a corrupted player in the 
event that the protocol aborts, even if a broadcast primitive and trusted access 
to an arbitrary pairwise functionality is assumed. This is contrasted with the 
computational setting, in which this stronger notion of security can be realized 
under standard cryptographic assumptions [12]. Our negative result strengthens 
a previous negative result from [II], which shows that pairwise functionalities 
alone (without broadcast) are not sufficient in general for fully secure n-party 
computation. For lack of space, the details of this result are deferred to the full 
version. 


2 Preliminaries 


Our communication model allows for authenticated point to point and broadcast 
channels unless specified otherwise. While we define our algorithms in terms of 
finite sets (with fixed input size) and fixed error rate, they can be implemented 
by uniform algorithms that are polynomial in the bit-length of the inputs, the 
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number of players, and the statistical security parameter « guaranteeing ô = 27“ 
error. The latter is the default convention whenever no value of 6 is specified. 
We only consider non-adaptive adversaries but our secret sharing definitions 
and proofs can easily be extended to the adaptive case. We denote the n players 
by P = {Pi, Po,...P,} and will often identify a player with its index. A collec- 
tion of subsets A of P will be called monotone if for any Be A, if BC CCP 


then, C € A. We let [n] denote the set {1,...,n}. We use x & X to denote a 
uniform choice of x from a set X. 


2.1 Secret Sharing 


We briefly describe our notation for standard secret sharing schemes. A secret 
sharing scheme is defined by a pair of algorithms (Share, Rec), where Share is a 
randomized algorithm mapping a secret from § to the share space [];_, Si, and 
Rec is a deterministic reconstruction algorithm mapping the shares of a qualified 
set of players (along with the identity of this set) to a secret from S. We will refer 
to S as the secret space and to S; as the share space of P;. An access structure is 
a monotone collection of player sets. We say that a secret sharing scheme realizes 
an access structure A if sets in A can reconstruct the secret s and others can 
learn nothing about it. Throughout this work we define secret sharing schemes 
to have perfect correctness (authorized sets always correctly reconstruct the 
secret) and perfect secrecy (the shares of unauthorized sets reveal no information 
about the secret). For all additional security guarantees we assume the adversary 
knows the secret that is being shared; even if the secret is compromised, the 
adversary should not be able to cause the reconstruction algorithm to behave 
undesirably (e.g. by outputting an incorrect secret or implicating an honest 
player of cheating), except with small probability. 

As usual, we consider a single adversary who may corrupt one or more play- 
ers. We distinguish between passive and active corruptions using the following 
terminology. 


Definition 1. (Tampering) A corrupted player is said to have tampered with 
its share if it provides to the reconstruction algorithm a share different than the 
one assigned by the distribution algorithm. Such a share is called a tampered 
share and such a player is called a cheater. 


Identifiable Secret Sharing. An identifiable secret sharing scheme is a secret 
sharing scheme in which the reconstruction algorithm can identify all cheaters in 
the event that it fails to reconstruct the secret. The above guarantee should hold 
except with some failure probability ô as long as there are at most t cheaters 
for an additional parameter t. In our definition we assume that the tampering 
is done by a single adversary who can observe the shares of a set C of up to t 
corrupted players and based on this information decide on how to tamper with 
their shares. 
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Definition 2. (Identifiable Secret Sharing) A secret sharing scheme real- 
izing A is (6,t)-identifiable if for any (unbounded) adversary A and any s € S, 
the success probability of A in the following game is at most ô: 


1. (51, S2,...,5n) 4+ Share(s); 

2. A outputs a set C C [n] such that |C] < t and receives (s;)j;ec; 

3. A outputs (B, (s})jecng) where B € A; 

4. Out + Rec(B, (tj)jeg) where tj = s} if j € C and tj = sj otherwise. 


A succeeds if for some j € CN B, s} # sj and Out # (L, {P; € CNB: s; # si})- 

The first work on identifiable secret sharing is due to McEliece and Sarwate 
[22] who showed that Shamir’s k-threshold secret sharing scheme allows perfect 
identification if k + 2t players of which at most t are cheaters are involved in re- 
construction. Several works consider various relaxations of identifiability [Z8IBI5] 
which suffice for some applications but are not suitable for MPC with a dishon- 
est majority. There is also substantial work on the efficiency of identifiable secret 
sharing [202417]. 

Identifiability is not possible with a dishonest majority for a simple reason: 
If half of the participants are dishonest they can run the sharing algorithm 
independently among themselves and return as their shares the output of the 
second run of the algorithm. This strategy makes it impossible for Rec to identify 
which half of the shares come from the first run of the Share algorithm and which 
come from the second since they are run independently. This is captured by the 
following theorem (see full version for proof): 


Theorem 3. (No identifiability with a dishonest majority) For any t,n, 
S,A with t > n/2, |S| > 2, AAO, there is no (1/4, t)-identifiable secret sharing 
scheme with secret space S and access structure A. 


3 Locally Identifiable Secret Sharing 


We now give our relaxation of identifiable secret sharing that can be realized 
when arbitrarily many players may be corrupted. Informally, the guarantee we 
require is that if the reconstruction fails, the reconstruction algorithm outputs 
a tuple of players to each player P; with the guarantee that if P; is honest, the 
tuple returned to P; is precisely the players that tampered with their shares. 
While this is equivalent to identifiability from the point of view of the honest 
players, it allows us to circumvent the impossibility result of Theorem [B] Note 
that we define LISS as being a special type of secret sharing scheme, so the usual 
correctness and secrecy requirements should hold in addition to the requirements 
detailed below. 


Definition 3. (Lists) Throughout this paper when we refer to a list L we refer 
to a subset of the players in the protocol (L C {P}, Po,..., Pn}). 
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Definition 4. (LISS) A secret sharing scheme realizing A is locally 6- 
identifiable if it satisfies the following requirements: 


— Unanimity: For any adversary A and s € S, the probability of A’s success 
in the following game is at most ô: 
1. (51, 2,-..,5n) < Share(s); 
2. A outputs a set C C |n] to corrupt and then receives (s; : i € C); 
3. A outputs (B,(sj)jecns) such that B € A and B ¢ C; 
4. Out + Rec(B, (t;)jep) where tj = s} if j E C and tj = sj otherwise. 
The adversary succeeds unless: 
1. Reconstruction succeeds: Out = s or, 
2. Each honest player’s list is the list of all cheaters: Out = (L, (L;)jex) 
where for all j € B\C, Lj = {P;E COB: s} Æsj}. 
— The scheme has Predictable Failures (Definition [9). 


We briefly motivate the requirement of Predictable Failures before defining it. 
The problem to address is that the additional outputs £;, or even the event of 
not reconstructing the secret, may leak some information concerning the secret 
unless a separate guarantee is made. This can cause a problem in applications 
and therefore we must have a way to simulate the actions of Rec in the case 
of tampering. Note that this is a new issue not present in identifiable secret 
sharing: As the Rec function does not simply output a list of tampering players, 
there are no a-priori guarantees concerning the lists corresponding to dishonest 
players and therefore we must make requirements on them separately. 


Definition 5. (Predictable Failures) A secret sharing scheme has ô- 
Predictable Failures if there is an algorithm SRec such that for any adversary A 
and s E€ S, the probability of success in the following game is less than 6: 


(81, 52,---,8n) < Share(s); 

A outputs a set C C [n] to corrupt and receives (5;)iec; 

A outputs (B, (s})jecng) such that Be A and B ¢ ©; 

SOut + SRec(C, B, (Si )iec, (si )ieCnB); 

Out + Rec(B, (tj) je8) where tj = s} if j € C and tj = sj otherwise. 


Sh ge E 


A succeeds unless: 


1. SRec correctly predicts success: SOut = SUCCESS and Out = s or, 
2. SRec predicts the output of Rec: SOut = Out F s. 


3.1 Our Construction 


Let (Sh, Rc) be a secret sharing scheme realizing access structure A with Sh : 
S — F” where F is a field. Let Inxn and Onxn denote the identity and all zero 
matrix respectively. We use F”*” to denote the set of all n x n matrices with 
elements in F and GL,,(F) to denote the set of all such invertible matrices. For 
a matrix M we will use M(i,7) to denote the (i, j) entry of M. By default we 
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assume vectors to be column vectors, we will use the notation a? when referring 
to a row vector. We use the notation F* to denote F \ {0}. 


Share(s): 
1. Generate (t1, ta,...,tn) < Sh(s), ui, vi & F* for allie În]; 


Itigil +ujvj +1 for iF J; 


2. Define Co € F”*” as hee 7 for i € [n] 


Coli, i) = 


Inxn Onxn 


3. Define C blockwise as: ( Co i) f 

4. Generate B È GLon(F) and define A = C B-t; 
Label row i of A as af and column j of B as bj; 
6. Return (s; = (a7 , bi, Ui, vi) ie tn]: 


or 


Rec(D, (si = (aT, bi, ui, vi) ied) with D € A, al, b; E F?” uj, vi e F*: 


1. If for alli 4 j, aTb; = ae + uv; +1: 
— Set alb; = t; for all i € B; 
— Return Rc(D, t; :i € D). 

2. Else, for all i € D set: 


Vira ae ee alb; Æ TAE + uivj +1 or aĵ bi A ae +ujvi + 1}; 
3. Return (L, (Li)iep. 


Theorem 4. If > n?(n+1)/(|F|—1), the scheme described above is a 5-LISS 
scheme realizing A with secret space S and share space S; = F4"*?. 


Corollary 1. Suppose there is a secret sharing scheme which realizes an n-party 
access structure A with secret space S and share length B. Then, for any 6 > 0 
there is a -LISS with the same A and S whose share length is O(n log(n/d)+n8). 


Outline of Security. A full proof of security is provided in the full version of 
this paper but we provide a brief intuition in this section for self containment. 
We first argue secrecy. Notice that the value t; is only used in generating the 
row af, therefore any set of players E ¢ A will have its shares generated using 
only the t; values such that P; € E. The fact that the joint distribution of these 
t; values do not depend on the underlying secret (due to the perfect secrecy of 
(Sh, Rc)) implies secrecy. 

Consider now an adversary that is attempting to tamper the share of some 
P; (and possibly others) - we will argue that any such attempt will cause the 
check in Rec between P; and any honest player Pj to fail with high proba- 
bility. Assume that the adversary is tampering b; — bi, v; > v; with one of these 


30 Y. Ishai, R. Ostrovsky, and H. Seyalioglu 


values changed (a similar argument will hold if the adversary is tampering af 


or u;). There are then two cases, either b; is in Span({b;}ier) where T is the set 
of corrupted players or it is linearly independent of these values. If bi is linearly 
independent it can be shown that ay bj, is essentially uniformly distributed over F 
conditioned on the view of the adversary even after u; is fixed and therefore the 
probability that reconstruction succeeds will be very low (showing this statement 
is non-trivial). 

On the other hand, consider the case where b; € Span({bk}ker). Let bf = 
rer Bkbrk where p € F. Now, the check will succeed only if: 


T 1 ie 
a; S > Bebe = ui ily! + uju! +1% 
keT 


S prlu ie vit! + ujuk + 1) = uitt vt + uzo! ao 
keT 


Similar to our argument of secrecy, the value u; is uniformly distributed condi- 
tioned on every view of the adversary. Therefore, the check in Rec will succeed 
only if the above equality is satisfied by a uniformly chosen uj € F*. This will 
happen rarely unless the polynomials on the left and right of the equality (con- 
sidered as a polynomial in uj) are equal. For this equality to hold, we must have 
Bk = 0 for all k Æ i since otherwise vg A 0 would make the polynomials different. 
Next notice that we must have 8; = 1 for the constant terms to match. Finally, 
the ujug term on the left implies that vj = v;. Therefore, unless vj = v; and 
bi, = b; this equality will only occur with low probability, which implies that if 
P; tampers with either the v; or b; value, it will be detected and placed on P;’s 
list with high probability for all honest P;. A similar argument holds if either the 
a! or u; value is tampered since the method of generation is equivalent to first 
generating A € GL2,(F) and setting B = A~!C since C is always invertible. 

Notice that we have actually argued that a dishonest player who modifies his 
share will be on the list of every honest player and symmetrically that all honest 
players will be on the list of such a dishonest player with high probability. This 
implies Predictable Failures since an adversary can easily tell which dishonest 
players will be on a given dishonest player’s list from the shares it has, as well 
as whether or not the honest players will be on his list depending on whether or 
not the player modifier his share. 


4 Relaxing Local Identifiability 


In this section we define a new commitment primitive, unanimously identifiable 
commitments that can be used as a leaner substitute for LISS in certain appli- 
cations. Additionally, we note that this commitment primitive implies a weaker 
variant of LISS (called unanimously identificable secret sharing) that can also be 
used in our applications to MPC. 
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4.1 Unanimously Identifiable Commitments 


A unanimously identifiable commitment (UIC) scheme has a single player (called 
the sender) committed to a value s € S by having a trusted dealer send commit- 
ments c; to all other players in the protocol and decommitment information d to 
the sender such that any tampering of the d value will cause all honest players 
to either reconstruct the original secret or fail reconstruction simultaneously. As 
with standard commitments, (¢;);¢{n] Should leak no information concerning s. 


Definition 6. (Unanimously Identifiable Commitments) A 46-UIC 
scheme consists of a randomized algorithm Offline and a deterministic algorithm 
Decommit with the following syntax: 


1. Offline: S > C” x D. Takes as input a secret s € S outputs n commitments 
C1,C2,;---,€n and decommitment information d. 

2. Decommit: C x D> SU{L}. Takes as input c; and the decommitment 
information d and recreates the secret s or outputs L indicating failure. 


Where the algorithms (Offline, Decommit) should satisfy: 


— Completeness. For any s € S, if Pr[Offline(s) = (c1,¢2,...,¢n,d)] > 0 
then, Decommit(c;, d) = s for any i € [n]. 

— Secrecy. The values c),c2,...,C, reveal no information concerning s. For- 
mally, for any c = (c1,C2,..-,¢n) and any s,s’ E€ S, the probability that the 
first n values of Offline(s) is c is equal to the probability that the first n 
values of Offline(s’) is c. 


We now present the final requirement placed on this primitive for use in our 
applications. In the full version of this paper, we include further intuition to the 
necessity of this condition but omit it here for space restrictions. 


There exists simulators W1, W2 such that the two guarantees described below 
hold with probability at least 1— ô for any A. Consider the following experiment: 


1. The adversary, A outputs a set T C [n] U {Q} of players to corrupt; 

2. (C1, €2,---,€n,d) < Offline(s); 

3. For allt € TM [n] send c; to the adversary, if Q € T send d to the adversary; 
4. If Q ZT, set dec = d; otherwise, dec is output by A. 

5. For alli € TN [n], A outputs (c;, i), fake commitment information for P;. 


The guarantees around this experiment are as follows: 


— Binding with Agreement on Abort. Decommit(c;,dec) = s for all P; 
uncorrupted or Decommit(c;, dec) = L for all P; uncorrupted. 
— Simulatable Abort. Let V be the view of A at the end of 5., then: 
1. If A corrupted Q: 
W,(V) correctly predicts if Decommit(c;, dec) =L for all i € [n]. 
2. Otherwise: 
W2(V, c.) correctly predicts if Decommit(c;, d) =L for each i € TN [n]. 
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4.2 A Unanimously Identifiable Commitment Scheme 


Let F be a field. We now give a simple construction of a 6-UIC scheme with 
S =F, C = F”+? and D = F°. 


Offline(s) : 


1. Generate P(X), a random n+1 degree polynomial over F such that P(0) = s; 


2. For all i € [n] generate x; & F and let yi = P(x); 
3. Set Ci = (zi, Yi) and d = P(X). Return ((ciJiein] d). 


Decommit(c; = (xi, yi), d = P(X) of degree n + 1) : If P(a;) A yi return L. 
Else, return P(0). 


Theorem 5. Let |F| > (n+1)?6-1+1. The scheme described above is a 5-UIC 
with S =F, C =F"t? and D = F°. 


Related Concepts. In our applications, we mainly use UIC as a substitute 
for digital signatures. There are some other unconditional notions that have also 
been introduced for similar purposes (such pseudosignatures [26], distributed 
commitments [I0] and IC signatures [25]). While the construction itself is not 
novel (for example, it is used in [25]), the property that all of the honest players 
accept or reject the same commitment is crucial to our applications and differs 
from the guarantees placed on the other primitives. 


4.3 Unanimously Identifiable Secret Sharing 
We note that unanimously identifiable commitments actually imply a weaker 
notion of LISS which we call unanimously Identifiable Secret Sharing (UISS). 
The security requirements for a UISS scheme are identical to the requirements 
to LISS except that the requirement: 
e Each honest player’s list is the list of all cheaters: 
is replaced by the requirement: 
e Each honest player’s list is the same subset of corrupted players: 

Out = (Lj)jeg where for all j, j’ € B\ T, £L; CT and Lj = Ly. 
All other requirements remain unchanged, including the requirement of pre- 
dictable failures. Implementing UISS for access structure A using a UIC scheme 
is straightforward by having each user commit to its share. Note that for most 


applications, UISS can take the role of LISS, at the cost of not necessarily iden- 
tifying all tampered shares if reconstruction fails. 
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A secure multiparty computation (MPC) protocol allows a set of players to 
compute a function evaluated on their individual inputs while revealing no in- 
formation other than the output of the function. We assume familiarity with 
(standalone) MPC throughout this section and refer the reader to [4] for formal 
definitions. 


5.1 Model of Computation 


By default, we consider static, computationally unbounded adversaries who may 
corrupt up to t of the n parties (t = n by default). We consider both active ad- 
versaries, who may arbitrarily control the corrupted players, passive adversaries, 
who can only observe the internal state of corrupted players, and fail stop ad- 
versaries who behave like passive adversaries except that they make corrupted 
players stop sending messages. Our network model is synchronous with point- 
to-point channels and a broadcast channel. 

The security of an MPC protocol with respect to an ideal functionality f is 
defined by comparing a real world execution of the protocol to an ideal model 
execution where a trusted party evaluates f. By default, we refer to statistical 
security, where the statistical advantage of distinguishing between the real world 
and the ideal model execution is bounded by 27“ for a statistical security pa- 
rameter «. We will only consider the case of secure function evaluation, in which 
f is stateless. We will mostly consider fully secure MPC in which the ideal model 
adversary cannot prevent the trusted party from sending the outputs of f to the 
honest players. Full security cannot be achieved even for simple functionalities 
such as coin-flipping [8] without an honest majority or other assumptions we will 
discuss. This impossibility holds even with trusted preprocessing; however, in the 
latter model the assumptions of secure point-to-point channels and a broadcast 
primitive are unnecessary as they can be implemented unconditionally [27]. 


5.2 Complete Primitives for MPC 


An n-party functionality g is called a complete primitive for n-party MPC if it 
is possible to securely realize any n-party functionality f in the g-hybrid model, 
namely by using ideal calls to g. Here we consider security against an active 
adversary who may corrupt an arbitrary number of players. 

In prior works, such primitives either depend on the complexity of the func- 
tion being evaluated [IT] or rely on cryptographic assumptions [I3]. It remained 
open to construct an unconditionally complete primitive whose complexity is 
independent of the complexity of the evaluated function f. In the following sec- 
tion, we show how to construct such a primitive. Our contribution can be seen as 
identifying a cryptographic LISS scheme implicitly present in the construction 
of Gordon et al. [73] and replacing it with an unconditional construction. In fact, 
it suffices for this purpose to rely on UISS rather than LISS. For simplicity, we 
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assume that the functionality f being evaluated using g delivers the same output 
to all players; the general case is handled similarly. 


Unconditional Primitive. The first primitive we present is complete for sta- 
tistically secure MPC and its complexity depends only on the output length of 
the evaluated functionality f. We give an informal description of the primitive 
in this version and defer further details to the full version. For expository pur- 
poses, we will describe three separate primitives that make up the three modes 
of operation for the complete primitive. 


— FCR! - Takes as input a bit from a player, runs an n-out-of-n UISS sharing 
algorithm on this bit and distributes the shares amongst all players. 

— FCR} - Takes as input two n-tuples of shares from the UISS scheme. Inter- 
nally, the primitive reconstructs the underlying secrets of each, evaluates the 
NAND of the two secrets, and re-shares the output using the UISS scheme. 
If reconstruction fails, the functionality will use the lists £L; output by the 
UISS scheme to partition the players: If any player is on his own list, the 
functionality declares this player is disqualified and his input is replaced by 
a default value by all players. If not, the functionality outputs a partition of 
the players: P; and Pj remain in the same partition if Lj = £i. 

— FCR} - Takes as input 6 separate n-tuples of UISS shares, where 8 is an 
output length parameter. The functionality either reconstructs each secret 
and broadcasts all the reconstructed bits, or, if some reconstruction fails, 
partitions the players as in the previous mode using the first instance of 
failed reconstruction. 


Note that while the first two primitives are randomized, they can be made deter- 
ministic by using a standard reduction: the internal randomness can be securely 
emulated by taking the XOR of shares contributed by the n players. 

Using the above primitive, one can securely evaluate any boolean circuit C, 
which consists of NAND gates and has 8 output bits, in the following way. The 
players first use FOR} to share each of their input bits. After this phase is 
completed, for each gate in C the players use FCR} to evaluate shares of the 
value of each internal value in C. Finally, the players feed the shares of the 
output values to FCR} and receive the outputs of C. 

Notice that any deviation from the above protocol will result in all honest 
players identifying the same set of cheaters, and therefore their lists £; will be 
identical. In this case, they are partitioned and the protocol is re-started with 
default values substituted for the inputs of the corrupted players. Due to the 
guarantees of the UISS scheme, the partitions can be simulated given only the 
views of the corrupted players. Defining the three modes of operation as one 
primitive that can be called on only some partition of the players requires some 
additional technical steps to fit in with our model of one trusted primitive. In 
addition to the players declaring which mode they are using, the primitive should 
also take as input from each player the set of players this player still trusts (as 
in [13]), we detail this in the full version of this paper. 
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The above complete primitive yields the following theorem. 


Theorem 6. There is a deterministic, polynomial-time computable functional- 
ity g with input and output size poly(n, K, B) such that any functionality f com- 
puted by a circuit of size o and output length B can be realized with full statistical 
security (and 27" simulation error) using poly(n,o) calls to g. 


Reducing the Number of Calls. Our second primitive improves on efficiency 
over the first by requiring fewer calls, but requires a preprocessing phase which 
is implemented using an MPC with identifiability on aborts (in other words, if 
the protocol fails then all honest players agree on the identity of a corrupted 
player.) Settling for computational security, such a protocol can be based on the 
existence of (two-party) oblivious transfer [I2]. 

The protocol for f begins by the having the players run an MPC protocol as 
above to compute UISS-shares of the output of f. In case the preliminary MPC 
protocol fails, all players disqualify the player that caused the abort and restart 
the protocol by using a default value as the input of disqualified players. 

We now describe the second primitive which is used to complete the protocol. 


— FOR? takes as input an n-tuple of UISS shares for a 68-bit secret and re- 
constructs the secret. In case reconstruction succeeds the primitive returns 
the reconstructed value to all players. If reconstruction fails, the primitive 
outputs a partition of the players by the lists output by the UISS scheme as 
in FCR'. 


The protocol for f proceeds by repeatedly interleaving the preliminary (compu- 
tational) MPC with calls to FCR? until an output value is successfully recon- 
structed by the latter. Each failure results in the honest players disqualifying at 
least one corrupted player. As before, in each point of the protocol all honest 
players agree on the identity of disqualified players. 


Theorem 7. Suppose an oblivious transfer protocol exists with computational 
security parameter A. Then there is a deterministic, polynomial-time computable 
n-party functionality g with input and output size poly(n,8,«) such that any 
polynomial-time computable f with output size B can be realized with full compu- 
tational security, up to neg(A) +27" simulation error, using at most n calls to g. 


In the full version we give two variants on the above theorem that eliminate 
the dependence on the output length at the price of increased complexity of the 
MPC phase, and reduce the number of calls to 1 at the price of increasing the 
complexity of the primitive exponentially in n. 


5.3 Partial Fairness with Preprocessing 


In this section we briefly sketch how the unanimously identifiable commitments 
(UIC) primitive can be used with the partially fair MPC protocols of Beimel et al. 
PI] to eliminate the assumption of cryptographic signatures in the preprocessing 
model. 
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Construction with an Off-Line Dealer. The MPC protocols from BHE] 
achieve unconditional security against fail-stop adversaries (with a non-negligible 
error 1/p) given a trusted preprocessing phase in which a dealer sends some se- 
cret information to each player. This information contains the messages each 
player should send during the protocol, but the choice of which message is sent 
may depend on the (public) identity of the players that aborted up to this point. 
To upgrade the security of such a protocol to hold against active adversaries, 
Beimel et al. rely on digital signatures to ensure that players do not deviate from 
their designated messages. Our observation is that one could instead rely on the 
UIC primitive by having the dealer give to the player who should send a message 
the decommitment information for this message and to all other players the cor- 
responding commitments. Then, if a corrupted player attempts to modify this 
decommitment information, all honest players will recognize this simultaneously 
and continue the execution as if this player had aborted. 

Note that when considering general MPC in this model (rather than coin- 
flipping), it may be useful to allow the preprocessing stage to depend on the 
players’ inputs. We refer to such a preprocessing phase as input dependent pre- 
processing. Since we require the outputs of the protocol to be unpredictable in 
the end of the preprocessing phaseli input dependent preprocessing cannot be 
used to trivially solve the problem by simply delivering the outputs of f to the 
players. 


Theorem 8. Let P be an r-round protocol with input dependent preprocessing, 
which realizes F with e-security against fail-stop adversaries who can corrupt 
up to t players. Furthermore, suppose that the online phase of P has the fol- 
lowing structure: in each round, each player sends a subset of the messages it 
had received in the preprocessing phase, where the identity of this subset can be 
computed publicly from the pattern of aborts up to this round. Then, there is a 
protocol P’ with the same features of P except that it is (e +27")-secure against 
active adversaries. 


In the case of randomized functionalities with no inputs, the above theorem does 
not require the preprocessing to depend on any inputs. In particular, applying 
the above theorem to the coin-flipping protocol with preprocessing implicit in 
the construction from [2], we get the following corollary. 


Theorem 9. Assume preprocessing by a trusted off-line dealer. Fix constants n 
and t such that t < 2n/3. Then, for any r, there is an r-round n-party uncon- 
ditionally secure coin-flipping protocol over point-to-point channels tolerating up 
to t malicious players with bias O(1/r). 


In the full version we present a variant of our general UIC-based technique 
which can make the preprocessing phase independent of the inputs. This variant 
efficiently applies only when the number of players is constant and the input and 
output domain of each player is polynomially bounded in the security parameter. 


1 More precisely, security in the preprocessing model requires to simulate the adver- 
sary’s view in the preprocessing phase before invoking the ideal functionality. 
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Applying this variant to general MPC protocols with 1/p-security from [i], we 
obtain the following theorem. 


Theorem 10. Assume preprocessing by a trusted off-line dealer. Let n and t be 
constants such that n/2 < t < 2n/3 and F be a deterministic n-party function- 
ality with input domain bounded by a polynomial d(«) for each player. Then, for 
any polynomial p(k), there is a polynomial-time r-round 1/p secure protocol for 
F which tolerates up to t corrupt players with r = pdr”. 
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Abstract. Yao’s garbled-circuit approach enables constant-round secure 
two-party computation of any function. In Yao’s original construction, 
each gate in the circuit requires the parties to perform a constant num- 
ber of encryptions/decryptions and to send/receive a constant number 
of ciphertexts. Kolesnikov and Schneider (ICALP 2008) proposed an im- 
provement that allows XOR gates to be evaluated “for free,” incurring 
no cryptographic operations and zero communication. Their “free-XOR” 
technique has proven very popular, and has been shown to improve per- 
formance of garbled-circuit protocols by up to a factor of 4. 

Kolesnikov and Schneider proved security of their approach in the 
random oracle model, and claimed that (an unspecified variant of) cor- 
relation robustness suffices; this claim has been repeated in subsequent 
work, and similar ideas have since been used in other contexts. We show 
that the free-XOR technique cannot be proven secure based on correla- 
tion robustness alone; somewhat surprisingly, some form of circular se- 
curity is also required. We propose an appropriate definition of security 
for hash functions capturing the necessary requirements, and prove secu- 
rity of the free-XOR approach when instantiated with any hash function 
satisfying our definition. 

Our results do not impact the security of the free-XOR technique in 
practice, or imply an error in the free-XOR work, but instead pin down 
the assumptions needed to prove security. 


1 Introduction 


Generic protocols for secure two-party computation have been known for over 
25 years [35/13]. (By “generic” we mean that the protocol is constructed by 
starting with a boolean or arithmetic circuit for the function of interest.) For 
most of that time, generic secure two-party computation was viewed as being 
only of theoretical interest; much effort was instead devoted to developing more 
efficient, “tailored” protocols for specific functions of interest. 

In recent years, however, a number of works have shown that generic protocols 
for secure two-party computation may be much more attractive than previously 
thought. This line of work was initiated by Fairplay [29], which gave an imple- 
mentation of Yao’s garbled-circuit protocol [35] secure in the semi-honest setting. 
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Subsequent works showed improvements in the scalability, efficiency, and usabil- 
ity of garbled circuits [7240920], extended the garbled-circuit technique to 
give implementations in the malicious setting [28/33/34], and explored alterna- 
tives to the garbled-circuit approach [2317/1137]. 

As secure computation moves from theory to practice, even small improve- 
ments can have a significant effect. (Three factor-of-2 improvements can reduce 
the time from, say, 1 minute to under 8 seconds.) Indeed, several such improve- 
ments have been proposed for the garbled-circuit approach: e.g., the point-and- 
permute technique [29] that reduces the circuit evaluator’s work (per gate) from 
four decryptions to one, or garbled-row reduction [30]33] that reduces the number 
of ciphertexts transmitted per garbled gate from four to three. 

It is in this spirit that Kolesnikov and Schneider introduced their very influ- 
ential “free-XOR” approach for improving the efficiency of garbled-circuit 
constructions. (The free-XOR optimization is compatible with both the point- 
and-permute technique and garbled-row reduction.) Yao’s original construction 
requires a garbled gate for each boolean gate in the circuit of the function be- 
ing computed. The free-XOR technique allows XOR gates in the underlying 
circuit to be evaluated “for free,” without the need to construct a correspond- 
ing garbled gate. (We defer the technical details to Section 2.2]) XOR. gates in 
the underlying circuit therefore incur no communication cost or cryptographic 
operations. Because of this, as documented in [26[25]33], it is worth investing 
the effort to minimize the number of non-XOR gates in the underlying circuit 
(even if the total number of gates is increased); this results in roughly a 40% 
overall efficiency improvement for “typical” circuits . For some circuits (e.g., 
basic arithmetic operations, universal circuits) a factor-of-4 improvement is ob- 
served [26]25]. Nowadays, all implementations of garbled-circuit protocols use 


the free-XOR idea to improve performance B3]17[34]24[1 9]20}. 


1.1 Security of the Free-XOR Technique? 


Given the popularity of the free-XOR technique, it is natural to ask what 
are the necessary assumptions based on which it can be proven secure[] The 
free-XOR approach relies on a cryptographic hash function H. Kolesnikov and 
Schneider [26] gave a proof of security for the free-XOR technique when H 
is modeled as a random oracle, and claimed that (a variant of) correlation 
robustness [21J14] would be sufficient; this claim has been repeated in sev- 
eral subsequent works [33]3]7|. (Informally, correlation robustness implies that 
H(ki@R),...,H(k:®R) are all pseudorandom, even given ki,...,k:, when R is 
chosen at random. In the context of the free-XOR technique we must consider 
hash functions taking two inputs. Formal definitions are given in Section [2.3}) 
Correlation robustness is a relatively mild assumption, and has the advantage 


1 It may be interesting to recall here that XOR. gates are also “free” when using the 
GMW approach to secure two-party computation [I3]. In that setting, no additional 
assumptions are needed. 
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relative to the random-oracle model of being (potentially) falsifiable. Moreover, 
correlation robustness is already required by existing protocols for oblivious- 
transfer extension 21], which are used in current efficient implementations of 
secure two-party computation. 


Our Results. It is unclear exactly what variant of correlation robustness is 
needed to prove security of the free-XOR approach, and Kolesnikov and Schnei- 
der (as well as subsequent researchers relying on their result) have left this ques- 
tion unanswered. We show here that the natural variant of correlation robustness 
(for hash functions taking two inputs instead of one) is not sufficient. We de- 
scribe where the obvious attempt to prove security fails, and moreover show 
an explicit counterexample (in the random-oracle model) of a correlation-robust 
hash function H for which the free-XOR approach is demonstrably insecure. 

We observe that the difficulty is due to a previously unnoticed circularity 
in the free-XOR construction: in essence, the issue is that H (k1®R) is used to 
encrypt both kz and k2®R. (The actual issue is more involved, and depends on 
the details of the free-XOR approach; see Section ]) We thus define a notion 
of circular correlation robustness, and show that any hash function satisfying 
this definition can be used to securely instantiate the free-XOR technique. Our 
definition is falsifiable, and is still weaker than modeling H as a random oracle. 
Our work can be viewed as following the line of research suggested in [10] whose 
goal is to formalize, and show usefulness of, various concrete properties satisfied 
by a random oracle. 

Besides the original work of Kolesnikov and Schneider, our results also impact 
security claims made in two other recent papers. Nielsen and Orlandi use an 
idea similar to that used in the free-XOR approach to construct a (new) protocol 
for two-party computation secure against malicious adversaries. They, too, prove 
security in the random-oracle model but claim that correlation robustness suf- 
fices; their construction appears to have the same issues with circularity that the 
free-XOR technique has. Applebaum et al. [3] define a notion of security against 
passive related-key attacks for encryption schemes, and claim that encryption 
schemes satisfying this notion can be used to securely instantiate the free-XOR 
approach (see [3] Section 1.1.2]). However, their definition of related-key attacks 
does not take into account any notion of circular security, which appears to be 
necessary for the free-XOR technique to be sound. We conjecture that our new 
definition of circular correlation robustness suffices to prove security in each of 
the above works. 

We do not claim that our work has any impact on the security of the free- 
XOR technique (or the protocols of [32)3]) in practice; in most cases, protocol 
implementors seem content to assume the random-oracle model anyway. Never- 
theless, it is important to understand the precise assumptions needed to prove 
these protocols secure. We also do not claim any explicit error in the work of 
Kolesnikov and Schneider [26], as they only say that some variant of correlation 
robustness should suffice. Our work pins down exactly what variant of correlation 
robustness is necessary. 
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1.2 Related Work 


The notion of correlation robustness was introduced by Ishai et al. [21], and has 
been used in several other works since then [162218233]. Applebaum et al. [3] 
and Goyal et al. further study the notion, explore various definitions, and 
show connections to security against related-key attacks [55H24]. To the best of 
our knowledge, none of the previous definitions of correlation robustness given 
in the literature suffice to prove security of the free-XOR technique. 

As mentioned above, we define a notion of security for hash functions that 
blends correlation robustness and circular security. The latter notion, as well as 
the more general notion of key-dependent-message security, has seen a significant 


amount of attention recently [6[15/18/2/81119}. 


1.3 Organization 


We review Yao’s garbled-circuit construction, and the free-XOR modification of 
it, in Section [2} In that section we also define a notion of correlation robust- 
ness that is syntactically suitable for trying to prove security of the free-XOR 
approach. In Section [3] we explain where a reductionist proof of security for the 
free-XOR approach fails when trying to base security on correlation robustness 
alone. We then demonstrate that no proof of security is possible by showing an 
example of a correlation-robust hash function for which the free-XOR, approach 
is demonstrably insecure. This motivates our definition of a stronger notion of 
security for hash functions in Section [4] one that we show suffices for proving 
security of the free-XOR technique. 


2 Preliminaries 


2.1 Yao’s Garbled Circuit Construction 


Yao’s garbled-circuit approach [35], in combination with any oblivious-transfer 
protocol, yields a constant-round protocol for two-party computation 
with security against semi-honest parties. We review only those aspects of the 
construction needed to understand our results; for further details, we refer 
to [26127]. 

Fix a boolean circuit C known to both parties. (For simplicity, we assume 
the circuit C outputs a single bit; the protocol can be easily extended to handle 
multi-bit outputs.) One party, the garbled-circuit generator, prepares a garbled 
version of the circuit as follows. First, two random keys w?, w} are associated 
with each wire i in the circuit; key w? corresponds to the value ‘0’ on wire i, while 
wl corresponds to the value ‘1’. For each wire i, a random bit 7; is also chosen; 
key w? is assigned the label \? = b@z;. For each gate g : {0,1}? > {0,1} in the 
circuit, with input wires 7, 7 and output wire k, the circuit generator constructs a 
“garbled gate” that will enable the other party to recover ye Ones ) (and its label) 
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from wre and wy (and their corresponding labels). The garbled gate consists of 


the four ciphertexts: 


Enc? n, o (W g(r T) On) (1) 
Enc? p aon; (we"2" lg(mi, 187; )@m) (2) 
Enc! ier; ot (wg gem, n)a) (3) 
Enc? on, 10r; (w09 |gen, 181;)OmR) (4) 


a Ue 


in that order. (In the above, we use Enc’, _,,,(-) to denote encryption under the 


two keys w and w’ that may also depend on the gate number g. The exact 
details of the encryption will be specified in the next section, but for concrete- 
ness the reader can for now think of it as being instantiated by Enci w (m) = 
Enc, (Encw (m)) with the gate number being ignored. We use here the point-and- 
permute technique so that the circuit evaluator only needs to decrypt a single 
ciphertext per garbled gate.) To evaluate this garbled gate, the circuit evalu- 
d?* and wi! Ay uses those keys to decrypt the ciphertext 
at position A°*, A7 of the above array; this will recover wt) j ygiba) 
XI ibi) 
k 


ator who holds w% 


, where 


= g(bi, bj)®Tk as required. 

Let 71,...,%¢ denote the input wires of the circuit. With garbled gates con- 
structed as above for each gate of the circuit (and transmitted to the circuit 
evaluator), we see that given keys wi, ieg w? * (and their corresponding labels) 
for the input wires, the circuit evaluator can inductively compute a key (and its 
label) for the output wire. The keys for input wires belonging to the circuit gen- 
erator can simply be transmitted to the circuit evaluator along with the garbled 
gates; the keys for input wires belonging to the circuit evaluator are obtained 
by the circuit evaluator using oblivious transfer (OT). If the circuit generator 
also sends To for the output wire o, the circuit evaluator can obtain the correct 
boolean output of the circuit on the given inputs. 

The above thus defines a protocol for two-party computation in the OT-hybrid 
model. If encryption is instantiated via Enc?, „„ (m) = Ency(Encyw:(m)) and Enc 
is a CPA-secure symmetric-key encryption scheme, the protocol is secure against 
semi-honest adversaries [35127]. 


2.2 The Free-XOR Technique 


Kolesnikov and Schneider [26] suggested that instead of choosing the keys w?, wł 


for each wire 7 independently at random, one could instead (1) choose a global 
random value R, (2) choose w? uniformly and independently at random for every 
wire i that is not the output of an XOR gate, and (3) set w} = w29@R. Each 
such wire is also assigned a random bit m; as before. If k is the output wire 
of an XOR gate with input wires i,j (whose keys have already been defined), 


then the keys for wire k are set to be w? = wow} and w} = wOR; also, Ty 
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is set to be mk = mO7;. If keys are chosen this way, then for any XOR gate 
N* and wy IES can simply compute 
= wow; and APO% = AVOA; this is correct since w” = wR 


i 


as above the circuit evaluator holding w?! 


bi Ob; 
Wy 


(and similarly for wy ), where the notation b;R evaluates to 0!”! if b; = 0 or to 
R otherwise, and thus 


wow’ = w Pw P (b;Dbj)R = wRO(b:Dbj)R = wp Oh 


and 


AVA’ = (b:OT4)@(OjOTj) = (b:@bj)Ome = Ap”. 


Note that by doing so, XOR gates incur no communication and require no cryp- 
tographic operations by either party. For the remaining (non-XOR) gates in the 
circuit, the circuit generator prepares garbled gates as in the previous section. 

As previously, the above defines a protocol for secure two-party computa- 
tion in the OT-hybrid model. Kolesnikov and Schneider suggest to implement 
encryption using a cryptographic hash function H as follows: 


Ency w (m) = H(wl|w'||g)@m. 


When encryption is instantiated in this way, Kolesnikov and Schneider prove 
security of their protocol, for semi-honest adversaries, when H is modeled as 
a random oracle. They also claimed that security would hold if H satisfies 
some “variant” of correlation robustness. While they did not specify precisely 
what variant of correlation robustness is needed, a natural approach would be 
that they require the (joint) pseudorandomness of H (w||w'||g), H(w®R]|w'||g), 
H(w||w®R]|g), H(W®SR|| w R]|g) for w, w’, R chosen at random. We discuss 
this issue further in the following section. 


2.3 Correlation-Robust Hash Functions 


As noted at the end of the previous section, Kolesnikov and Schneider claim 
that some variant of correlation robustness would be sufficient to prove security 
of the free-XOR construction. Let H = {H,, : {0,1} — {0, 1o} be a 
family of hash functions, where for simplicity we write H instead of H, when 
the security parameter n is understood. Correlation robustness was defined by 
Ishai et al. [21] as follows: 


Definition 1. H is correlation robust if for any polynomial p(-) and any non- 
uniform polynomial-time distinguisher A, the following is negligible in the secu- 
rity parameter n: 


E E [A(wr, Wp, H(wi®R),..., H(wp®R)) = 1| 


= Prin, wp {0,1} in ua, upt {0,1} tout (>) [Alun, +++) Wp, U1,--- , Up) = 1| 


? 


where p = p(n). 
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In the context of the free-XOR technique as defined by Kolesnikov and Schnei- 
der, an appropriate definition of correlation robustness needs to at least cap- 
ture the security requirement (informally) that given any pair of keys w? iwy 
for some garbled gate constructed as in Equations @)-(4), with Enc, „„ (m) = 
H(w||w’||g)®m, it should be possible to decrypt only one row while the others 
remain hidden. Since the hash function H now takes three inputs, the definition 
of correlation robustness needs to be modified appropriately. Moreover, for the 
free-XOR approach it appears necessary to allow w; to take on arbitrary valued? 
rather than being chosen uniformly and independently at random; roughly, this 
0 _ 


is because in the free-XOR construction we have wọ} = w?@wi when k is the 


O ,,,0 0 


output wire of an XOR gate with input wires 7,7, and so wï, wj, w are not 


independent. We capture these requirements in the following definition. 


Definition 2. H : {0,1}°" {0,1} is (weakly) 2-correlation robust if 
for all polynomials p(-) the distribution ensemble 


Re {0,1}™ ; 


A(wi@Rl|wi||1), H(wı||w1ÐR||1), (wi SRl|wi SRI|l1) 


A (wp®R||wpl|p), H (wp||wp®R\lp), T(wp®R||wpSR||p) J wr- wp € {0,1} 
Why ess Wy € {0,1} in (™) 


is computationally indistinguishable from the uniform distribution over 
{0,1}8Pfour(™) (in both cases, p = p(n).) 


Simplified to the case p = 1 with w1,w', chosen uniformly and independently 
(and ignoring the last input to H), the definition requires that the values 


A(w OR||w}), H(wi||w1 OR), H(wi®R||wi OR) 


be jointly pseudorandom even given w1, w{. Note that this is equivalent to, say, 
requiring that 


A(wy||w}), H(w8R]||w1 OR), A(wi||w, OR) 


be jointly pseudorandom given w,@R, wi, and thus may appear to capture the 
requirements necessary for proving the free-XOR technique secure. 

It will be more convenient to rephrase the above as an oracle-based definition, 
and this also provides a point of departure for the definition we will propose in 
Section [4] (In fact, the oracle-based definition we give is stronger than Defini- 
tion 2] as it allows the adversary to adaptively choose w;, w; based on previous 
outputs of H. But see footnote[2]) Fixing some H, define oracles Corpr(-,-,-) and 
Rand(-,-,-) as follows: 


2 We will show impossibility of proving security based on correlation robustness alone. 
Thus, a stronger definition of correlation robustness only strengthens that result. 
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— Corrtw, w’, g): output H(u||w'@R||9), HweR||w'||g), and HwR||w'OR||9). 
— Rand(w, w’, g): if this input was queried before then return the answer given 
previously. Otherwise choose u + {0,1}% «+ and return u. 


We now have the following definition: 


Definition 3. H is 2-correlation robust if for all non-uniform polynomial-time 
distinguishers A the following is negligible: 


Pr[Re_{0, 1}2"™ ; score) (17) = 1) — Pref ARO (1) = 1]. 


3 Insufficiency of Correlation Robustness 


In this section, we show that 2-correlation robustness is not enough to prove 
security of the free-XOR technique. We start by describing where the natural 
attempt to prove security (following, e.g., the proof of [27]) fails. We then show 
a construction (in the random-oracle model) of a hash function H that satisfies 
Definition[3] but for which the free-XOR approach is demonstrably insecure when 
instantiated using H. 


3.1 Where the Reduction Fails 


Consider the case where the circuit consists of just a single AND gate, with input 
wires 1 and 2 (belonging to the circuit generator and evaluator, respectively) 
and output wire 3. Say the circuit evaluator has input 0 and so receives key w8; 
assume for concreteness that the circuit generator has input 0 as well and so the 
circuit evaluator is also given key w). (The circuit evaluator will also be given 
the corresponding labels, but these can be left implicit in what follows.) The 
garbled gate consists of the values 


A (wi ||w3||1) © (w§|/0) 

A (wi ||w3@Rl|1) © (w510) 

H(w} ®R||w9||1) © (w510) 
H(w{OR||wy@R\|1)  ((w3eR)||1) 


in some permuted order, for some random value R unknown to the circuit evalua- 
tor. (Recall that wł = w?@R for all i, by construction, when using the free-XOR 
approach.) The evaluator will be able to decrypt the first row, above, to learn 
the output; it should not, however, be able to learn any information about the 
remaining three rows. (In particular, it should not learn whether the other party 
had input 0 or 1.) The natural way to try to prove security of the above is to ar- 
gue that the remaining rows are pseudorandom, by reduction to the 2-correlation 
robustness of H. In the reduction, we would have an adversary A given access 
to an oracle O that is either Corg (for a random R) or Rand. The adversary 
A can choose random w, w9, and then query O(w?,w9,1) to obtain three val- 
ues hi, h2,h3 that are either completely random or equal to H(w)|/wS@R\1), 
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H(wWRİ||w!||1), and A(w?SR||wSSRI|1). But A cannot complete the simula- 
tion, since it has no way to compute values of the form 


h1@(w§|l0), h2@(w3l|0), hse (wer) 


(since A does not know R) as would be necessary to simulate the remaining 
three rows of the garbled gate in case O = Corp. 

We show in the next section that this is not just a failure of this particular 
proof approach, since we can construct a hash function H that satisfies Defini- 
tion Bl yet for which the free-XOR methodology is demonstrably insecure when 
instantiated using H. 


3.2 A Counter-Example 


For simplicity, we fix a value of the security parameter n. Assume further that the 
last input to H (i.e., the gate index g) is represented using n bits. We construct 
a pair of oracles H : {0,1}3"” > {0,1}"*1 and Break : {0, 1}®"*3 — {0,1}” such 
that: 


— H satisfies Definition B] even if the distinguisher A is given oracle access to 
both H and Break. 

— The free-XOR methodology is demonstrably insecure when instantiated us- 
ing H, against an adversary given oracle access to both H and Break. 


Thus, we rule out a fully black-box reduction of the security of the free-XOR 
technique to 2-correlation robustness. 

Let H : {0,1}8" > {0,1}"*! be a random function, and define Break as 
follows: 


Break(w|w” ||g||z1|| z2|| 23): If there exists r € {0,1}” such that 
2 = H(w||w'er||g), 22= H(wér|lw'l|g), and 23= H(wOr||w'Grl|g) (70), 


then output r (if multiple values of r satisfy the above, take the lexico- 
graphically smallest one); otherwise, output L. 


We now prove the above claims. 


Lemma 1. H is 2-correlation robust, even when the distinguisher is given oracle 
access to both H and Break. 


Proof (Sketch). Fix a polynomial-time distinguisher A who is given access to 
H, Break, O where either O = Corr (for random R € {0,1}") or O = Rand. 
Without loss of generality, we assume that A does not repeat queries to O. 
When O = Rand, every query to O is answered with a string of length 3-(n+1) 
that is uniform and independent of A’s view. When O = Corr, every query 
O(w,w’,g) is answered with a string of length 3- (n + 1) that is uniform and 
independent of A’s view unless one of the following is true: 
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— A at some point queries O(w, w, g) with Ww = R or ww = R (or both). 

— A at some point queries H(w||w'||g) with Ww = R or w' Sw’ = R (or both). 

— A at some point queries Break(w]]w’ || g|| 21|| z2||z3) where it holds that R||0 = 
z3®8H (w R||w®R]|g). 


Since R is chosen uniformly from {0,1}", the probability that A makes any 
queries of the above form is negligible. 


Lemma 2. The free-XOR construction, when instantiated using H, is not se- 
cure against a semi-honest adversary (with oracle access to H and Break) who 
corrupts the circuit evaluator. 


Proof. We show that a semi-honest adversary can recover R with high probabil- 
ity. Since a semi-honest adversary can (legitimately) recover one key per wire by 
following the protocol, knowledge of R allows the adversary to recover both keys 
for every wire in the circuit; thus, this suffices to show that the construction is 
insecure. 

Assume the first gate in the circuit is an AND gate with input wires 1 
and 2 (belonging to the circuit generator and evaluator, respectively) and output 
wire 3. Say the circuit evaluator has input 0 and so receives key w8; assume for 
concreteness that the circuit generator has input 0 as well and so the circuit eval- 
uator is also given key w9. With constant probability we have 7, = 72 = 73 = 0, 
and in that case the garbled gate consists of the values 


coo = H(w8ljw$||1) © (w910) 

cor = H(w9||w8 RIL) & (w80) 

co = H(w@R|w$||1) © (wSl]0) 

cu = H(wP@R|lwSoRI|1) © ((wZoR)II1) . 


The circuit evaluator can compute w§ from coo (as directed by the protocol). It 
then computes 


zı = C01 8(w3]||0) 


z2 = C19 @(w§||0) 


z3 = cu 8(w§||1) 


and queries Break(w?||w$||1|| z1||z2||z3). If the answer is some value R’ 4 then 
with overwhelming probability it holds that R’ = R. (Correctness of R can also 
be verified by looking at a second garbled gate with known inputs.) 


4 Proving Security of the Free-XOR Approach 


The essence of the problem(s) described in the previous section is that there is 
a previously unnoticed circularity in the free-XOR approach, since in the gen- 
eral case both H(wi||we||g)@w3 and H(wi@R||w20R||g)G(w3GR) are revealed 
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to the adversary. (Recall that R is the hidden secret here.) In this section, we 
introduce a new security definition that explicitly takes this circularity into ac- 
count, and show that this definition suffices to prove security of the free-XOR 
approach. 

Fix some function H : {0,1}84»(™ — {0, 1}, We define an oracle Circr 
as follows] 


— Circg(w, w’, g, b1, be, b3) outputs H (wb: R|| w b2R||g)9b3 R. 


To see the connection with the previous definition (in the context of corre- 
lation robustness), note that oracle Corr(w,w’,g) defined previously outputs 
Circr(w, w’,g,0,1,0), Circr(w,w’,g,1,0,0), and Circr(w,w’,g,1,1,0); i.e., b3 
was fixed to 0 there. The possibility of bẹ = 1 is exactly what models circu- 
larity involving R. 

Corresponding to the above we define an oracle Rand in a way analogous to 
before: 


— Rand(w, w’, g, b1, b2, bs): if this input was queried before then return the an- 
swer given previously. Otherwise choose u + {0,1}£««”) and return u. 


In our new definition of security for H, we are going to require that oracles Circe 
(for random R) and Rand be indistinguishable. This cannot possibly be true, 
however, unless we rule out some trivial queries that can be used to distinguish 
them. Let O be the oracle to which a distinguisher is given access, where either 
O = Circr or O = Rand. We must restrict the distinguisher as follows: (1) it is not 
allowed to make any query of the form O(w, w’, g, 0,0, b3) (since it can compute 
H(w||w’||g) on its own) and (2) it is not allowed to query both O(w, w’, g, b1, b2, 0) 
and O(w, w’, g, b1, b2, 1) for any values w, w’, g, b1, be (since that would allow it to 
trivially recover R). We say that any distinguisher respecting these restrictions 
makes legal queries. 

With this in place we can now define our notion of circular 2-correlation 
robustness. 


Definition 4. H is circular 2-correlation robust if for any non-uniform 
polynomial-time distinguisher A making legal queries to its oracle, the follow- 
ing is negligible: 


Pr[Re{0, 1}%"™ + ATRO (1) = 1] — PrfARandO 1") = 4]. 


Next, we show that this notion of security suffices to prove security of the free- 
XOR approach: 


Theorem 1. Consider the protocol described in Section [B.J for two-party com- 
putation in the OT-hybrid model. If H as used there is circular 2-correlation 
robust, then the resulting protocol is secure against a semi-honest adversary. 


3 Here, we slightly abuse the notation bR so that it evaluates to 0°*™ if b = 0 or 
Rijo°out™—fin™ otherwise. 
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Proof. The case where the circuit generator is corrupted is trivial. Therefore, we 
consider corruption of the circuit evaluator B. We describe a simulator who is 
given the input of B and the output z € {0,1} of evaluating the function, and 
must provide B with a simulated garbled circuit that is indistinguishable from 
the actual one that would be sent during a real execution of the protocol. The 
high-level idea is exactly the same as in [27]; the crucial difference is that we 
reduce to circular 2-correlation robustness of H. 
The simulator proceeds as follows: 


1. For each wire 7 in the circuit that is not an output wire of an XOR gate, 
choose w; + {0,1}”" and A; + {0,1}. 

2. For each wire k in the circuit that is the output wire of an XOR gate with 
input wires i, j (for which w;, A;, wj, A; have already been defined), set wg = 
widw,; and Ak = ADAj. 

3. For each non-XOR gate g in the circuit with input wires i,j and output 
wire k, output the four ciphertexts coo, Co1, Cio, and c11 as the corresponding 
garbled gate, where c,,,, = H(w,||w;||g)®(wel|Ax) and the remaining three 
ciphertexts are uniform strings of length n + 1. 

4. For the output wire o, set To = A\.@z. 


Say 71,...,%¢ are the input wires of the circuit belonging to the circuit gener- 
ator, and j1,...,je are the input wires belonging to the circuit evaluator. The 
simulator gives to B the values w;,,...,wj,, (as if they came from the calls to 
the OT functionality), and the simulated communication that includes (1) the 
keys wj,,---,Wi,, (2) the garbled gate for each non-XOR gate in the circuit, and 
(3) the value 7, corresponding to the output wire. 

We claim that the simulated view is indistinguishable from the real-world 
execution of the protocol. Assume there is an adversary B who can distinguish 
the two distributions when the inputs to the parties are x and y, respectively, and 
the output is z. We show an adversary A who breaks the circular 2-correlation 
robustness of H. Given access to an oracle © (that is either Circ or Rand), 
adversary A does as follows: 


1. For each wire i in the circuit that is not an output wire of an XOR gate, 
choose w;+-{0, 1}” and A;{0, 1}. 

2. For each wire k in the circuit that is the output wire of an XOR gate with 
input wires i,j (for which w;, ;, wj, A; have already been defined), set wg = 
Wi DW; and Ay; = NBDA;. 

3. For each wire i, let b; € {0,1} be the actual value on wire i; this can be 
determined since A knows the actual input (x, y) to the circuit. Set wè = wy, 
mi = AiDbi, A} = m, and A} = 167; (i.e., only w, Ps are left undefined). 

4. For each non-XOR gate g in the circuit with input wires i,j and output 
wire k, output the four ciphertexts coo, Co1, Cio, and c11 as the corresponding 
garbled gate, where these ciphertexts are constructed as follows: 

u = H(w lwr Ig) Owe lA). 


J 


— cCc 


AA 
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— For (X}*, 57) € {0,1}? with (8;,8;) # (bi, bj), query 
hg,,8; = O(wi, wj, g9, Bibi, Bj Dd;, g( Gi, Bj )Bbx), 


b (Bibi) 
2:81 = hpg DWR Ap O). 
a ONG 


5. For the output wire o, set To = A.@z (where z is the known output of the 
circuit). 


and set c 


A gives to B the values w;,,...,w;,, (as if they came from the calls to the OT 
functionality), and (1) the keys w;,,...,wz,, (2) the garbled gate for each non- 
XOR gate in the circuit, and (3) the value z, corresponding to the output wire. 
Finally, A outputs whatever B outputs. It is easy to see that A makes legal 
queries to its oracle. Furthermore, it is also easy to see that if O = Circ the view 
of B is identically distributed to its view in the real execution of the protocol on 
the given inputs, whereas if O = Rand the view of B is distributed identically to 
the output of the simulator described previously. This completes the proof. 


5 Conclusion 


The free-XOR technique has been extremely influential, and it is currently used 
in all implementations of the garbled-circuit technique because of the speedup 
that it gives. It was previously known that this approach is secure in the random- 
oracle model; it was also claimed that some variant of correlation robustness 
would suffice to prove security, but the exact notion of correlation robustness 
needed was left unspecified. In this work, we explore this question. We show that 
the natural variant of correlation robustness (extended to handle hash functions 
taking several inputs, rather than one input) is not sufficient, and identify a 
previously unnoticed circularity in the free-XOR construction that causes the 
difficulty. We are thus motivated to propose a new, stronger notion of correlation 
robustness, and we prove that this notion suffices. 

Several intriguing open questions remain. First, is there some variant of the 
free-XOR approach that does not rely on any assumptions beyond CPA-secure 
encryption (which is all that is needed to prove security of classical garbled- 
circuit protocols in the OT-hybrid world)? Alternately, can our definition of 
circular 2-correlation robustness be realized from standard cryptographic as- 
sumptions? 


Acknowledgments. The second author would like to thank Vlad Kolesnikov 
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Abstract. We propose a 2-party UC-secure protocol that can compute 
any function securely. The protocol requires only two messages, commu- 
nication that is poly-logarithmic in the size of the circuit description of 
the function, and the workload for one of the parties is also only poly- 
logarithmic in the size of the circuit. This implies, for instance, delegat- 
able computation that requires no expensive off-line phase and remains 
secure even if the server learns whether the client accepts its results. To 
achieve this, we define two new notions of extractable hash functions, 
propose an instantiation based on the knowledge of exponent in an RSA 
group, and build succinct zero-knowledge arguments in the CRS model. 


1 Introduction 


In the setting of secure two-party computation, two parties with private inputs 
wish to jointly compute some function of their inputs while preserving certain 
security properties like privacy, correctness and more. Despite the stringent re- 
quirements of the standard simulation-based security definitions [Can00], 
it has been shown that any probabilistic polynomial-time two-party functionality 
can be computed securely against malicious adversaries [Yao86](GMW87] |Gol04]. 
Following these feasibility results many constructions have been proposed to im- 
prove the efficiency of the computation [IPS09} [PSSW09} NO09 [LP11) TKO*1]]. 
A recent work by Gordon et al. shows an approach using oblivious 
RAM, with polylogarithmic amortized workload overhead. The best round com- 
plexity is obtained by who show a single round protocol in 
the non-interactive setting. For a general study of multiparty computation with 
minimal round complexity, see KP 10}. 

The communication complexity of these constructions depends heavily on the 
size of the computed circuit. To the best of our knowledge, all works that try to 
minimize the communication complexity do so for particular tasks of interests 
such as private information retrieval (PIR) or functions captured by 
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branching programs and random access memory machines [NNOI]. In all these 
constructions, the parties do essentially the same amount of work, namely at least 
the amount of work needed to evaluate the specified circuit. Such constructions 
are appropriate for settings in which the parties are equally powerful, and offer 
no solution for “asymmetric settings” in which one of the devices is strictly 
(computationally) weaker than the other (e.g., smartcards, mobile devices). In 
this paper we will be interested in solutions for such asymmetric settings, so we 
want to minimize the workload for one of the parties. 

For semi-honest attacks, fully homomorphic encryption can 
be used to design a simple one round protocol with sublinear communication 
complexity. Here one party, say P,, sends its encrypted input to party P2, who 
uses the homomorphic property to compute ciphertexts that contain the desired 
output. These ciphertexts are sent to P, who can decrypt and learn the result. 
Obviously, this solution breaks down under malicious attacks. The obvious so- 
lution is to have Pz give a non-interactive zero-knowledge proof (NIZK) that 
his response is correct, but this will not solve our problem. Even though such a 
proof can be made very short [Grol1], Pı would have to work as hard as P to 
check the NIZK, and hence the computational complexity for both parties would 
be linear in the circuit description of the function to compute. This does not fit 
our scenario where we want to minimize the work for one party. 

To reach our goal, one needs a protocol by which a prover can give a short 
zero-knowledge argument for an NP statement, where the verifier only needs to 
do a small amount of work. More precisely, the amount of work needed for the 
verifier is polynomial in the security parameter and the size of the statement 
but only poly-logarithmic in the time needed to check a witness in the standard 
way. Such proofs or arguments are usually called succinct. The history of such 
protocols starts with the work of Kilian who suggested the idea of having 
the prover commit to a PCP for the statement in question using a Merkle hash 
tree, and then have the verifier (obliviously) check selected bits from the PCP. 
This protocol is succinct and zero-knowledge but requires several rounds and so 
cannot be used towards our goal of a 2-message protocol. Subsequent work in this 
direction has concentrated on protocols where only a succinct non-interactive 
argument (and not zero-knowedge) is required. This is known as a SNARG. 
Micali suggested one-message solution based on Kilian’s protocol and 
the Fiat-Shamir heuristic. In Aiello at al. suggested a two-message 
protocol where the verifier accesses bits of the PCP via a private information 
retrieval scheme (PIR). In such a scheme a client can retrieve an entry in a 
database held by a server without the server learning which entry was accessed. 
It seems intuitively appealing that if the prover does not know which bits of the 
PCP the verifier is looking at, soundness of the PCP should imply soundness 
of the overall argument. However, it was shown in that this intuition 
is not sound. Di Crescenzo and Lipmaa |CL08) suggested a solution where the 
prover commits to a PCP using the root of a Merkle tree as in Kilian’s protocol, 
but to prove security, they made a very strong type of extractability assumption 
implying extraction of an entire PCP from the prover in one go. 
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Our Contribution. Compared to the work on SNARGs just discussed, our work 
makes two contributions: first, we show how to achieve simulation based privacy 
also for the prover, even if the verifier is malicious. We need this since our goal 
is UC-secure 2-party computation and we must have privacy for both parties, 
even under malicious attacks. This is the reason we need a set-up assumption 
allowing parties to give non-interactive zero-knowledge proofs of knowledge of 
their inputs. Also, to get a zero-knowledge SNARG, we do not use the PCP+PIR 
approach from earlier work for a general PIR, instead we build a PIR-like scheme 
based on FHE, allowing the prover to compute NIZKs “inside the ciphertexts” . 
Second, we suggest two notions of “extractable hash function” that are more 
natural and milder than the assumption of Di Crescenzo and Lipmaa but still 
allow succinct arguments. 

Based on these techniques we present a two-party protocol in the common 
reference string model that computes any PPT functionality f with UC security 
against malicious adversaries. Our protocol is the first to additionally achieve 
the following strong properties: Polylogarithmic communication complexity in 
the size of the circuit C that computes f. One round complexity, i.e., a single 
message in each direction. Polylogarithmic workload in the size of the circuit C 
that computes f, for one of the parties. Our protocol is based on fully homo- 
morphic encryption, non-interactive zero-knowledge proofs and the existence of 
extractable hash functions. While the first two notions are fairly standard, we 
explain in more detail the new notions of extractability: 

The first extractability assumption (EHF1) considers a collision intractable 
hash function H mapping into a small subset of a large domain and essentially 
asserts that the only way to generate an element in Im(H) is to compute the 
function on a given input. More precisely, we require that for every adversary 
outputting a value h there exists an efficient extractor that (given the same 
randomness) outputs a preimage of h, whenever h € Im(H). We propose an in- 
stantiation of EHF 1-extractable and collision intractable hash functions based on 
a knowledge of exponent assumption in Z*,, for N is an RSA modulus. 

The second extractability assumption (EHF2) makes a weaker demand on 
the hash function H: again we require that for each adversary outputting h, 
there exists an extractor that tries to find a preimage. This time, however, the 
extractor is allowed to fail even if h € Im(H). The demand, however, is that 
if the extractor fails, the adversary cannot find a preimage either, even if he 
continues his computation with fresh randomness and auxiliary data that was 
not known to the extractor. 

It is easy to verify that EHF1 implies EHF2: under EHF1, the extractor only 
fails if it is impossible to find a preimage. The more interesting direction is 
whether EHF2 implies EHF 1. In the concurrent and independent work of Bitan- 
sky et al. [BCCTI1], they consider a variant of EHF1 where the hash function 
has a stronger notion of collision intractability, so-called proximity collision re- 
sistance. They then show that proximity EHF1 is equivalent to proximity EHF2 
and furthermore existence of such functions is equivalent to the existence of 
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non-interactive arguments of knowledge (SNARKs). Whether our EHF2 notion 
implies EHF1 is an interesting open question. 

Note that EHF2 is true in the random oracle model, where we let the random 
oracle play the role of H. In this case it is easy to see that no matter how the 
adversary produces a string h, there are only two cases: either h was output by 
the random oracle or not. In the former case a preimage is easy to extract, in the 
latter case no one can produce a preimage except with negligible probability. So 
the extractor can safely fail in this case. 

Finally, it is interesting to note that EHF2 opens the possibility to use many 
more candidate hash functions, whereas previously only rather slow functions 
based on number theoretic assumptions seemed to apply. This is because stan- 
dard hash functions such as SHA (are thought to) behave similarly to a random 
oracle, and such a function does not satisfy EHF1. However, using, e.g., the 
random oracle preserving EMD transform from [BR06], one may get interesting 
candidates for efficient functions satisfying EHF2. 

We wish to warn the reader that extractability assumptions are regarded as 
controversial by some; on the other hand such assumptions have recently been 
studied quite intensively [GLRII]. Moreover, 
Gentry and Wichs have recently shown that SNARGs cannot be shown 
secure via a black-box reduction to a falsifiable assumption [Nao03]. Even more 
to the point, as mentioned above, [BCCTI1], shown that existence of SNARKs 
imply existence of extractable hash functions. This suggests that non-standard 
assumptions such as knowledge of exponent are necessary in this setting and 
hence our construction is essentially tight. Finally, as we pointed out above, the 
EHF2 assumption is true in the random oracle model and is implied only by the 
fact that one must call the oracle to get a valid output. So we only use one of the 
many “magic properties” that the random oracle model has, and this particular 
one is in fact satisfied in the standard model, if our assumption holds. Therefore, 
we believe that the assumption on extractable hash functions should be regarded 
as much less controversial than using the random oracle model. 


Applications. Variants of our construction is useful for various settings. We 
briefly describe some of these applications here, for further details and additional 
applications, see the full version of this paper |DFH11)}. 


NON-INTERACTIVE SECURE COMPUTATION. In the non-interactive setting a 
receiver wishes to publish an encryption of its secret input x so that any other 
sender, holding a secret input y, will be able to obliviously evaluate f(x,y) 
and reveal it to the receiver. This problem is useful for many web applications in 
which a server publishes its information and many clients respond back. A recent 
work by Ishai et al. presents the first general protocol in this model 
with only black-box calls to a pseudorandom generator (PRG). In contrast, our 
protocol makes non black-box use of the fully homomorphic encryption but only 
requires polylogarithmic communication complexity. 
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DELEGATABLE COMPUTATION. In this setting, a computationally weak client 
wishes to outsource its computation to a more powerful server, with the aim 
that the server performs this computation privately and correctly. An important 
requirement in this scenario is that the amount of work put by the client in order 
to verify the correctness of the computation is substantially smaller than running 
this computation by itself. It is also important that the overall amount of work 
invested by the server grows linearly with the original computation. Lately, the 
problem has received a lot of attention; see for 
just a few examples. Our construction implies delegatable computation and can 
be simplified here because P, (the client) is usually assumed to be honest, and 
P> (the server) does not contribute any input y to the computation. Therefore 
we do not need a set-up assumption, and in contrast to earlier work, the scheme 
requires no expensive off-line phase and remains secure even if the server learns 
whether the client accepts its results. 


Concurrent Related Work. In recent concurrent and independent work, Bi- 
tanski et al and Goldwasser et al. both define notions of 
extractable hash function that are technically slightly different from our EHF1 
notion, but similar in spirit. They each propose instantiations different from 
ours. They then build SNARGs based on this assumption, and also 
build SNARGs that are in addition proofs of knowledge (SNARK’s), and show 
the very interesting result that existence of SNARKs are equivalent to two no- 
tions of extractable hash function similar to EHF 1, respectively EHF2, known 
as strong and weak proximity extractable hash functions. 

Privacy for the prover is not considered in [GLRII]. In Zero- 
knowledge SNARKs and secure computation based on this is shown in the CRS 
model. They consider only stand-alone rather than UC security, on the other 
hand they obtain a protocol whose communication complexity is independent of 
the parties input. This can also be obtained from our construction using a simple 
modification based on PCP’s of knowledge, but UC security would be lost. 


2 Notations and Definitions 


In this section, we review standard notations. Due to space constraints, we do 
not give a definition of secure computation here, the definition and proof can 
be found in [DFHIi]. We denote the security parameter by n and adopt the 
convention whereby a machine is said to run in polynomial-time if its number of 
steps is polynomial in its security parameter. We use the standard definitions 
of negligible functions and indistinguishability of families of random variables, 
these can be found in the full version [DFH11]. For convenience, we use a single 
security parameter for all our primitives and proofs. For an integer t, we denote 
by [t] the set {1,...,t}, and by {0,1}<° the set of all binary strings of length at 
most t — 1. If X is a random variable then we write x + X for the value that 
the random variable takes when sampled according to the distribution of X. If 
A is a probabilistic algorithm running on input z, then we write z «+ A(z) for 
the output of A when run on input z. 
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2.1 Public Key Encryption Schemes 


We specify the notion of public key encryption scheme. We use the standard 
notion of semantic security and refer to the full version |DFH11) for a formal 
definition. 


Definition 1 (PKE). We say that IIg = (KeyGen, Enc, Dec) is a public key 
encryption scheme (PKE) if KeyGen, Enc, Dec are algorithms specified as follows. 


— KeyGen, given a security parameter n (in unary), outputs keys (pk, sk), where 
pk is a public key and sk is a secret key. We denote this by (pk, sk) + 
KeyGen(1”). 

— Enc, given the public key pk and a plaintext message m, outputs a ciphertext 
c encrypting m. We denote this by c + Encpx(m); and when emphasizing 
the randomness R used for encryption, we denote this by c <— Encyx(m; R). 

— Dec, given the secret key sk and a ciphertext c, outputs a plaintext message 
m s.t. Decsx(Encp,(m)) = m. 


2.2 Fully Homomorphic Encryption Schemes 


We define fully homomorphic encryption and additional desired properties. We 
will say that a bit string pk is a well-formed public key, if it can be generated as 
output from the KeyGen algorithm on input the security parameter and a set of 
random coins in the range specified for the key generation algorithm. Similarly, 
a bit string c is a well-formed ciphertext if c = Encpk(m; r) for message m and 
random coins r lies in the range specified for the encryption algorithm. 


Definition 2 (FHE). We say that Ig = (KeyGen, Enc, Dec, Eval) is a fully ho- 
momorphic encryption scheme (FHE) if KeyGen, Enc, Dec are algorithms specified 
as in Definition O] and Eval is an algorithm specified as follows. 


— Eval, given a well-formed public key pk, a boolean circuit C with fan-in of size 
t and well-formed cipherterts c,,...,ce encrypting M1,..., Me respectively, 
outputs a ciphertext c such that Decsk(c) = C(mi,..., me). 


We further require the existence of a refresh algorithm Refresh so that for well- 
formed pk, c1, ..., Ce, the following distributions are statistically close, 


{pk, Refreshpx (Evalp (C, c1,...,¢¢))} =s {pk, Refreshp(Encpk (C (m1, ..., me)))} 


Typically, Refresh would run Eval again on ciphertexts Eval,,(C,ci,...,¢¢), an 
appropriately chosen encryption of zero and an addition gate. The idea is that 
the randomness for the encryption of zero is chosen large enough to “drown” 
the randomness coming from the original encryptions. We need that Refresh is 
correct, in the sense that on input well-formed pk, cy, ...,cg as above, it outputs 
with probability 1 a ciphertext that decrypts to C(m,...,mz). We also require 
that Ig is semantically secure. Finally, we note that we require compactness in 
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the sense that the output of Eval is upper bounded by some fixed polynomial 
regardless of C or the input length. 

We note that our requirements on correctness of the Eval and Refresh algo- 
rithms are stronger than what is usually assumed by existing schemes in the 
literature: we want them to generate output of the expected form with probabil- 
ity 1 whenever the input is well-formed, whereas other definitions only require 
correct behavior on average over the distribution we expect the input to have. 
We need the stronger requirement because we need Eval and Refresh to behave 
correctly even on adversarially generated input where we cannot assume a partic- 
ular distribution. All we can require is a ZK proof that the input is well formed. 
However, the stronger requirement can be assumed for all FHE schemes we are 
aware of [BV11b]: typically, the key generation and 
encryption involves choosing randomness according to a (discrete) Gaussian dis- 
tribution. Using a standard tail inequality, we can assume that randomness with 
the correct distribution is in some small range except with negligible probabil- 
ity and define well-formed public keys and ciphertexts to be those that can be 
produced using randomness that is in range. Since the probability of being out 
of range is negligible, this will not affect the security of honestly generated ci- 
phertext, on the other hand, the guaranteed bound on the randomness will give 
us room to evaluate and refresh without creating incorrect results. 


2.3 Efficient Probabilistic Checkable Proofs (PCP) 


A PCP system IT = (ProVpcp, Verpcp) for a language L consists of two PPT 
algorithms: the prover Provpcp and the verifier Verpcp. The prover Provpcp takes 
as input an instance x € L and a witness w for x and computes a proof m of 
length £ := poly(|z|,|w|). The verifier Verpcp inputs a potential member x and 
decides whether x € L given oracle access to the proof oracle m. In this work, we 
are interested in PCP systems where the verifier only has non-adaptive access 
to the proof system. To model this, we define the PCP verifier Verpcp as a tuple 
of algorithms (Verscp: Verecp): the first has no access to the PCP ~m and uses only 
polylog(|z|) bits of randomness to compute t := O(1) positions specifying where 
to read the PCP. The second machine, Verocps is deterministic and takes as input 
the bit values of the PCP at these t positions. It outputs whether to accept or 
reject 7. We note that non-adaptivity is required as privacy of our protocol may 
not hold in case of an adaptive corrupted verifier. 
Formally, we require the following two properties to hold: 


Definition 3 (PCP). A probabilistically checkable proof (PCP) system (Provpep, 


Vert, Ver? or a language L is a triple of (probabilistic) polynomial-time 
pcp pep g g 


machines, satisfying 


— Completeness: If x € L, m < Provpcep(£, w) and (q1, ..., q) — Verscp(®, b r) 
with qi E [¢], then Pr[Vers., (2, ma], R Tae], Q1,--- sqt) 1] =1. 
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— Soundness: If x ¢ L, then for all x we have 


Pr[(q1, ei rT) bE Vet sep (T, |r]; r) : Vereen (x, ma], ene , Tlaqt, Q1,--- sqt) = 1] 
< negl(n), 


for negligible function negl(-), probability taken over the verifier’s internal 
coins. 


Notice that standard definitions of PCP systems usually require the soundness 
error to be smaller than 1/2. We get a negligible soundness error by amplification. 

In this paper, we are interested in PCP’s for NP languages such that the 
verifier accepts or rejects after using only polylog(|a|) bits of randomness and 
accessing only O(1) bits of 7. Moreover, we are interested in efficient protocols 
and, hence, require that the (probabilistic) prover runs in poly(|z|,|w|) time. 
PCP proof systems with efficient verifiers were introduced in the seminal work 
of Babai, Fortnow, Levin and Szegedy [BFLS91]. More efficient candidates have 
for instance been proposed in [Din07]. Most PCP systems 
require only a non-adaptive verifier and, hence satisfy our additional property 
from above. 


2.4 Collision Resistant Hashing and Merkle Trees 


Let in the following {Hn}nen = {H : {0,1}?™ —> {0,1 ™}, be a family of 
hash functions, where p(-) and p'(-) are polynomials so that p'(n) < p(n) for suf- 
ficiently large n € N. For a hash function H «+ Hn a Merkle hash tree is 
a data structure that allows to commit to £ = 2% messages by a single hash value 
h such that revealing any message requires only to reveal O(d) hash values. A 
Merkle hash tree is represented by a binary tree of depth d where the £ messages 
mM 1,...,mg are assigned to the leaves of the tree. The values that are assigned 
to the internal nodes are computed using the underlying hash function H. The 
single hash value h that commits to the @ messages mj ,..., me is assigned to 
the root of the tree. To open the commitment to a message m,;, one reveals m; 
together with all the values assigned to nodes on the path from the root to mi, 
and the values assigned to the siblings of these nodes. We denote the algorithm 
of committing to l messages m1,...,mg by h = Commit(m},...,m) and the 
opening of m; by (m;, path(i)) = Open(h, i). Verifying the opening of m; is car- 
ried out by essentially recomputing the entire path bottom-up while comparing 
the final outcome (i.e., the root) to the value given at the commitment phase. 
For simplicity, we abuse notation and denote by path(i) both the values assigned 
to the nodes in the path from the root to decommitted value m,;, together with 
the values assigned to their siblings. 

The standard security property of a Merkle hash tree is collision resistance. 
Intuitively, this says that it is infeasible to efficiently find a pair (#1, £2) so that 
H(x1) = H(x2), where H + Hn for sufficiently large n. One can show that 
collision resistance of {H,}nen carries over to the Merkle hashing. Formally, 
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Definition 4 (Collision Resistance). A family of hash functions {Hn}n is 
collision resistant if for all PPT adversaries A there exists a negligible function 
negl such that for sufficiently large n € N we have Pr[Hash4 n, (n) = 1] < negl(n) 
where game Hashy7,,(n) is defined as follows: 


1. A hash function H is sampled H — Hn. 
2. The adversary A is given H and outputs x, x’. 
3. The output of the game is 1 if and only if x £ x' and H(x) = H(a’). 


2.5 Non-interactive Zero-Knowledge Proofs 


In the following we repeat the definition of non-interactive zero-knowledge proof. 


Definition 5. A non-interactive zero-knowledge proof for a language L is a tuple 
of three PPT algorithms (CRSGen,P,V), such that the following properties are 
satisfied: 


Completeness: For every (x,w) € Ry (for Ry the witness relation of L) 
Priers + CRSGen(1”) : V(crs, x, P(crs,x,w)) = 1] = 1. 


Soundness: For every PPT algorithm A there exists a negligible function negl 
such that for alla ¢ L 


Pr|(x, m) < A(crs), crs + CRSGen(1") : V(crs, 2,7) = 1 ] < negl(n). 


Zero-Knowledge: there exists a PPT simulator S = (S1, S2) such that for all 
(x,w) € Ry the distributions (i) {P(crs,x,w)} and (ii) {S2(crs,x,td)} are 
computationally indistinguishable, where in (i) crs < CRSGen(1”") and in 
(ii) (crs,td) + S1 (1”). 


2.6 Extractable Hash Functions 


In this work, we are interested in hash functions that are extractable — so-called 
extractable hash function (EHF). We provide two flavors of extractable hash 
functions. The first extractability assumption (EHF1) considers a hash function 
H mapping into a small subset of a large domain and essentially asserts that the 
only way to generate an element in Im(#) is to compute the function on a given 
input. More precisely, we require that for every adversary outputting a value 
h there exists an efficient extractor that (given the same randomness) outputs 
a preimage of h, whenever h € Im(H). We propose later an instantiation of 
EHF 1-extractable and collision intractable hash functions based on a knowledge 
of exponent assumption (Damgård [Dam91]) in Z% where N is an RSA modulus. 
We continue with the formal assumption. For simplicity, we assume that the 
algorithms below are keeping their state. 
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Definition 6 (Extractable hash function 1 (EHF1)). Let A and E be PPT 
algorithms then consider the following game: 


— EHF 1a4.5,1,(1", 2). 


H Hn 
Repeat until A halts: 

h + A(1”, H, z; R) 

z + E(1”, H, z, R, h; R’) 

If h € Im(H) and H(z) Æ h return 1, else reply A with z 
Return 0 


for R and R' the randomness used by A and E respectively. Then the family 
{Hn}nen satisfies the first extractability assumption (EHF1) if for every PPT 
adversary A there exists a PPT extractor E such that for any sufficiently large 
n € N and any auziliary information z € {0,1}* 


Pr[EHF14 E H, (1”, ż) = 1] SS negl(7). 
for a negligible function negl, the probability is over the randomness of the game. 


In the above definition, we require that it should be feasible to verify that a 
value h is in the image of H; we call this function Im(#). 

The second extractability assumption (EHF2) makes a weaker demand on 
the hash function H: as before, we require that for each adversary outputting h, 
there exists an extractor that tries to find a preimage. This time, however, the 
extractor is allowed to fail even if h € Im(#). Specifically, the demand is that 
if the extractor fails, the adversary cannot output a preimage either. For this 
definition not to be vacuous, one clearly needs that when the adversary tries to 
“beat” the extractor, it is given randomness/auxiliary input that is not known 
to the extractor. Otherwise the extractor could simulate the adversary and out- 
put whatever the adversary does. To formalize this, we assume a probabilistic 
algorithm G that outputs a pair (¢,¢’), sampled from some joint distribution. Ç 
is given to both the adversary and the extractor, while ¢’ is only given to the 
adversary later when she tries to “beat” the extractor. In our case, Ç is a pub- 
lic key for an encryption scheme and ¢’ is its corresponding secret key. Notice 
that our demand on G is weak as G does not depend on the choice of the hash 
function. 

Finally, we note that in a simpler definition is considered, where 
the adversary runs an arbitrary algorithm in the last stage of the game and 
the extractor is required to work for any such algorithm. In particular, it must 
work for an adversary that knows something not known to the extractor. This 
is a much stronger demand that may exclude some potential constructions of 
extractable hash functions [] 


1 [BCCTI]] also considers weaker variants. While the basic idea of EHF2 is a con- 
tribution of this paper, the precise formulation was in part inspired by discussions 


with the authors of |BCCT11}. 
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Definition 7 (Extractable hash function 2 (EHF2)). Let A and E be PPT 
algorithms then consider the following game: 


— EHF24 g E H,(1”,2). 

i=0,H — Hn, (Ç, c’) — G(1") 

Repeat until A halts: 
i=i+1 
Zi — E(1”, H, z, R, hi, ¢; R’) 

GÈ, 2) - AQ", H, 2, R, R”) 

Ifd1<j <i, st. H(z) Æ hj A A(z) =h; return 1, else return 0 

Then {Hn}nen satisfies the EHF2 assumption if for every PPT adversary A 


and any PPT algorithm G there exists a PPT extractor E such that for any 
sufficiently large n E€ N and any auxiliary information z € {0,1}* 


Pr[EHF 2, ¢.5,1,,(1", z) = 1] Š negl(n). 
for a negligible function negl, the probability is over the randomness of the game. 


When we talk in the following of an extractable hash function, then we mean 
that it satisfies the property given in Definition [7] i.e., any PPT adversary has 
a negligible advantage in EHF 2, GE, Hn- 

Note that EHF?2 is true in the random oracle model, where we let the random 
oracle play the role of H. In this case it is easy to see that no matter how the 
adversary produces a string h, there are only two cases: either h was output by 
the random oracle or not. In the former case a preimage is easy to extract, in the 
latter case no one can produce a preimage except with negligible probability. So 
the extractor can safely fail in this case. 

It is easy to verify that EHF1 implies EHF2: under EHF1, the extractor only 
fails if it is impossible to find a preimage. 


2.7 The Knowledge of Exponent Assumption 


The knowledge of exponent assumption proposed by Damgård was pre- 
viously used in designing 3-round zero-knowledge proofs [HT98], plaintext-aware 
encryption and more. It was originally defined with respect to 
prime order groups; here we consider its variant for composite order groups. 
Say N is a product of two safe primes p = 2p’ + 1 and q = 2q' + 1. We con- 
sider the group of so-called signed quadratic residues OR Fs It consists of all 
numbers in Zy with Jacobi symbol 1 in the interval [0,...,(N — 1)/2]. The 
product of a,b € QRẸ is defined to be ab mod N if ab mod N < (N — 1)/2 and 
N — ab mod N otherwise. OR is isomorphic to the group of quadratic residues 
mod N and so has order p’q’. Furthermore, it has the nice property that mem- 
bership in QR}, is easy to check. We let g,g’ be generators for OR* where 
g' = g7 and z is picked at random from Ziq: Informally, the assumption says 
that for any PPT algorithm A(N,g,g’) that outputs h, h’ such that h = g” and 
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h’ = g7” there exists an extractor E such that (h, h’, y) < E(N,g,g’) with high 
probability. We refer the reader to the full version for a formal definition of the 
knowledge of exponent assumption in the group of signed quadratic residues. 

Based on the knowledge of exponent assumption, we can construct an ex- 
tractable hash function according to Definition [6] Moreover, under the factoring 
assumption our construction is collision resistant. The public parameters of our 
family of hash functions are a composite N which is the product of two safe 
primes p = 2p' + 1 and q = 2q' + 1 and two generators g,h for OR}. For 
some concrete N,p,q,g,h, we compute the hash function on some input z as 
H(z) = (g7 mod N, h” mod N). Collision resistance follows from factoring, since 
for every z Æ z’ such that H(z) = H(z’) it holds that p’q’ divides z — z’. More- 
over, if one knows x such that h = g” mod N, then one can check membership of 
a pair (a, b) in the image of H by checking whether a € OR}, and aë mod N = b. 
Finally we note that H is an EHF1, which follows from the knowledge of expo- 
nent assumption. 


3 Secure Two-Party Computation with Low 
Communication 


Consider two parties P; with input x and P> with input y, respectively, who wish 
to jointly compute a function f(x,y). Without loss if generality we only consider 
single-output functions and assume that only P, learns the output f(x,y) (the 
general case can be easily obtained from this special case but this requires 
additional communication). We are interested in protocols that allow P, and P> 
to securely compute f(x,y) in the presence of malicious adversaries that follow 
arbitrary behavior. Our proof of security guarantees the strongest notion of sim- 
ulation based UC security in the presence of static malicious adversaries. 
Moreover, we require that our protocol achieves the following strong properties: 
Polylogarithmic communication complexity in the circuit-size C that computes 
f. One round complexity, i.e., a single message in each direction assuming an 
appropriate trusted setup. In this work we prove our protocol in the common 
reference string model. Polylogarithmic workload for P, in the circuit-size C. 

We introduce our main construction step-by-step. Our starting point is a 
standard protocol secure against honest-but-curious adversaries for which party 
P, sends its encrypted input to party P2, who uses the homomorphic property 
to compute ciphertexts that contain that the output of the specified circuit 
when evaluated on P,’s (encrypted) input and his own private input. These 
ciphtertexts are sent to Pı who can decrypt and learn the result. Obviously, 
this solution completely breaks down against malicious attacks. So additional 
cryptographic tools must be used in order to ensure correct behavior. We then 
use this protocol as a building block in our main construction, adding new tools 
to protect against an increasingly powerful adversary. Namely, we first show how 
to prove security in the presence of a corrupted P and then prove simulation 
based security for both corruption cases. For completeness, we formally describe 
the standard protocol with security against honest-but-curious adversaries. 


66 I. Damgard, S. Faust, and C. Hazay 


3.1 Security against Honest-But-Curious Adversaries 


We begin with a standard protocol with security in the face of honest-but-curious 
adversaries. The main building block here is fully homomorphic encryption Hg = 
(KeyGen, Enc, Dec, Eval, Refresh). 


Protocol 1 (Honest-but-curious adversaries) 


— Inputs: Input x for party P, and input y for party P2. A description of function 
f for both. 
— The protocol: 
1. Pi(x) generates a key pair (pkconpsSkcomp) - KeyGen(1”) for a fully homomor- 
phic encryption scheme, computes es = Encpx.... (x) and sends (Pomp > €z) to 
Pr. 
2. P2(y) computes d = Evalpk... (Cy, Y, €x) and sends c = Refresh... (d) to Pr. 
3. P, decrypts c and obtains the result of the computation f(x,y) = DeCskconp (€). 


Security of P, follows by the semantic security of Ig. Similarly, security of Pz 
follows from the ability to refresh the ciphertext sent back to Pı so that it only 
encrypts the outcome. It is easy to see that the communication complexity is 
independent of the complexity of the circuit-size C that computes f, and only 
depends on its inputs and outputs lengths, and the security parameter. 


3.2 Security against a Malicious P, 


We extend the above protocol and allow P; to be malicious (if corrupted), 
while P> remains honest-but-curious. To this end, we use standard techniques 
to achieve security in the malicious setting by relying on NIZK proof systems 
(CRSGen, P, V) and an idealized setup. Specifically, we let P, send two encryp- 
tions encrypted under two different keys (one public key for which P, knows the 
secret key and the other public key is placed in the common reference string), 
so that the same plaintext is encrypted. This enables the simulator to extract x 
using the trapdoor of the common reference string. In addition to that, P} must 
prove that its public key, together with the ciphertexts, are well-formed. Note 
that the statement proved below asserts that each ciphertext is produced from 
a message and randomness of the expected range, so it is implicitly asserted 
that these ciphertexts are well-formed. Nevertheless, we still need to prove well- 
formness of pk.onp- This is essentially immediate when specifying the random 
coins used to generate it as part of the witness, since all it takes is to verify 
whether these coins are of the expected range. In order to formalize this proof 
we define language L as follows. 


L := {(€x, €% PKcomps Pkr) : I (Skcomps Tpk; Tæ, Tg, 2) S-t. Cx = ENCpk np (23 Tx) 
A eh = Encpk, (2; r3) A (Pkcomps SKconp) + KeyGen(1”, rpk) 


^ Tpk yields a well formed pkcomp }- 


This proof is utilized in Step [Ib] of Protocol] The complete protocol follows. 
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Protocol 2 (Malicious P;) 


— Setup: Generate keys (pk,,,skz) <_ KeyGen(1”). Set the common reference string 
crs = (pk,,0), where o + CRSGen(1") is the common reference string used for 
proving membership in L. 

— Input: Input x for party Pı and input y for party P2. A description of function f 
for both. 

— The protocol: 

1. First message computed by party Pi. 

a) Setup. Generate a key pair (Pk cmp; Skcomp) <- KeyGen(1”) for a fully ho- 
momorphic encryption scheme and compute es = EnCpk..n., (x). 

b) Proof of consistency. Compute e, = Encpk, (x) and a NIZK proof mu 
proving that Pko and ex are well-formed and that ex and e, encrypt the 
same plaintezt x. 

c) The complete message. Send (ex, €l, Pk comp: PKs TL) to Pe. 


2. Second message computed by party P2. 

a) Verification of NIZK. Upon receiving message (ex, el, Pk oop» Pk... TL) 
from Pı, verify m by running V((ex, ey, pk pka), TL). If it outputs 0, 
then abort. 

b) Circuit evaluation. Compute d = Evalpk on (Cf Y: €x) for Cy a PPT 
circuit computing f, and refresh the ciphertext to get c = Refreshpx.,. (d)- 

c) The complete message. Send the result c to Py. 

3. The output. P; decrypts c and obtains the result of the computation f(x,y) = 
DeCskconp (€) 


comp? 


Clearly, if both parties behave honestly Pı learns the correct output. 


Theorem 8 (One-Sided Security). If Hg = (KeyGen, Enc, Dec, Eval, Refresh) 
is semantically secure and (CRSGen,P,V) is a non-interactive zero-knowledge 
proof, Protocol[Q securely evaluates f in the presence of malicious P) and honest- 
but-curious P> with constant communication in the circuit-size for f. 


Intuitively, security against malicious P, follows from the soundness of proof 7. 
A simulator Sı for an adversary corrupting P, can be designed by first verifying 
the proof m. Next, Sı extracts the adversary’s input x’ using the secret key 
Skz. Sı sends x’ to the trusted party computing f and receives the outcome. 
It then encrypts this value and sends it back to the adversary. Security against 
corrupted P» follows from the semantic security property of Ig. Communication 
complexity depends only on the input/output length of f. 


3.3 Security against Malicious Adversaries 


In this section we present our full protocol that protects against malicious ad- 
versarial attacks. Our protocol uses Protocol 2] as a building block but adds 
additional tools. This essentially amounts to a SNARG allowing P; to verify the 
correctness of the output issued by Pz. More precisely: 
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We first add a PCP system (Provpep, (Versep: Ver. Crcp)) (cf. Definition B), used 
by P> for proving membership in the language Lı. Formally, Lı is defined by 


Ly := {(c¢, es, Pk comps Cy» pk,,, f) :3 (d,ra,ry,y) s-t. d = Evalpk cup (Crs Y, €x) 
A c= Refreshpk on (d; ra) ^ €y = Encpx, (Y; ry) }- 


Namely, the PCP shows that if one decrypts c it gets the desired result 
f(x,y), where x is the plaintext contained in e, and y is the plaintext in ey. 
This proof is utilized in Step BPclof Protocol] We recall that the statement 
proved asserts that e, is produced from a message and randomness of the 
expected range so it is implicitly asserted that e, is well-formed. 

We further let P> commit to this proof using a Merkle hash tree instan- 

tiated with an extractable collision resistance hash function H : {0,1}* > 
{0,1} (cf. Definition [7). The main problem with this is that hashing the 
proof does not necessarily conceal it, unless a special hiding property is re- 
quired form the underlying hash function. We fix that by hashing the com- 
mitted PCP instead, and then prove that the values embedded within these 
commitments correspond to a valid proof. 
Furthermore, since the verifier must not see the queried bits from the proof 
(due to privacy considerations), we consider an NP statement claiming that 
if the PCP verifier Verocp is run on Decsk, (Cq), - - - , DeCsk, (Cg,), denoting 
the ciphertexts encrypting (Tq, ---, Tq) — the openings for the PCP queries 
(qi,---, qt), then it will accept. That is, 


pel Zpeps (d1, -+ -3 qt), (Cars - -+> Cq )) £ 


(oio Tine lige Noes Tpke) s.t. (vi € [t] = Encpr, (Dii Ya:)) 
A Vereen (Zep Tuss T udite) =1} 


for the instance zpcp € Ly. In our protocol, (q,¢g,,.--,¢g,) are all encrypted 
under FHE with respect to public key pk,,,, enabling P, to verify this proof. 
Note that the code of Verney is independent of the strategy followed by a 
malicious Pı. Furthermore, notice that the we do not explicitly need to 
include checks of well-formedness for the ciphertext cg,,...,Cg, since these 
are implied by the fact that the ciphertext are possible outputs on proper 
inputs I4,,Yq;- This proof is utilized in Step 2f]in Protocol [8] Importantly, 
the number of queries asked by P, is polylogarithmic in the PCP size (and 


hence in the circuit-size that computes f). 


The above implies that P; has to provide encryptions of the queries qi,..., qt- 
In order to ensure correctness of these queries, we add a non-interactive zero- 
knowledge proof for which Pı proves that the queries were indeed sampled from 


the correct range. This is formalized in Step [Ic] of Protocol B] below. 


An overview of our protocol. We summarize the discussion above. (1) At first, 
P, sends its input x encrypted under two distinct public keys together with 
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the encrypted PCP queries and a proof of correct behavior. (2) Pz then replies 
with ciphertexts that contain the output of the specified circuit, as generated 
above. It then produces a PCP for this computation and commits to it using a 
Merkle tree. Finally, P2 computes ciphertexts that contain the answers for the 
PCP queries by opening the corresponding paths in the Merkle tree generated 
above (note that this step is performed obliviously within the fully homomorphic 
encryption scheme). P> sends the computation of f(x,y) and answers to PCP 
queries with a non-interactive zero-knowledge proof for correct computations. 
Intuitively, the overall communication complexity depends on the number of 
PCP queries, the answers to these queries and the overhead induced by the non- 
interactive zero-knowledge proofs. Recall first that PCP systems are sound even 
after observing only polylogarithmic bits of the proof. Moreover, each answer to 
such a query requires providing the corresponding path in the hashed Merkle tree 
of the PCP which includes logarithmic number of elements (in the proof’s size). 
Finally, we utilize zero-knowledge proofs with communication that is polynomial 
in the size of the witness. All these tools ensure that the overall communication 
is polylogarithmic in the circuit’s size. We are now ready to present our protocol. 


Protocol 3 (Malicious adversaries) 


— Setup: Generate keys (pk,,skz) <- KeyGen(1”) and (pk,,sky) + KeyGen(1”) f] 
Set the common reference string crs = (pk,, pk,,c), where o is a joint common 
reference string used by P, for proving membership in L and by P2 for proving 
membership in Li and Le. Pick an extractable collision-resistant hash function 
H & Hn for H : {0,1 P™® = {0,1}"™. 

— Input: Input x for party Pı and input y for party P2. A description of function f 
for both. 

— The protocol: 

1. First message computed by party Pi. 

a) Setup. Generate key pairs for a fully homomorphic encryption scheme 
(Pk comp» SKconp) + KeyGen(1”) and (Pk ro; SKpro) +— KeyGen(1”), and com- 
pute ex = Encpx.,,,(X)- 


pro? 


b) Proof of consistency. Compute e, = Encpk, (x) and a NIZK proof mu 
proving that pk,,,, pk ex are well-formed and that ex and e, encrypt 
the same plaintext x. 

c) Queries for PCP. Sample t positions (q1,...,q@) < Vetcp Seen, £) and 
for each i encrypt them as bi = Encpx,,, (qi). Moreover, for each i compute 
a NIZK proof m; that q lies in the correct range [4]. 

d) The complete message. Send mı := (zx, €s, PK comps PKpros TL), (bi; i )ie[e)) 
to Po. 


comp? 


2. Second message computed by party P2. 

a) Verification of NIZK’s. Upon receiving message mı from Pi, verify m 
by running V((ex, e, Pk ony + pk), TL). If it outputs 0, then abort. 

b) Circuit evaluation. Compute d = Evalpx..,,(Cy,y,€x) and refresh it to 
get c = Refreshpx.,,, (di ra). Also, compute ey = Encp, (Y; fy). 


a 


? We note that these public keys do not have to be associated with the fully homo- 
morphic encryption scheme. For convenience, we assume that they do in order to 
avoid overload of parameters. 
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c) Compute PCP. Compute a PCP I = Provpcp(Zpep, pep) of length £ = 
poly(n), where wpcp := (d,ra,ry,y) forms an NP witness for the instance 
Zpcp ‘= (c, Ez, Pk comp: Cy, pk,» f) E li. 

d) Commit to PCP. Fori € [£] compute ciphertexts ci = Encpk, (Ii; yi) and 
compute the Merkle hash root using H, for h = Commit(ci,...,ce), where 
for simplicity we let £ be a power of 2. 

e) Answer PCP queries. Compute pq; = Encpx,,,(path(q); pq) for i € [t] 
by running Evalpr,,, on input bi (sent by Pı) and (ci,...,¢e) (computed 
above), where path(qi) = Open(h, i). 

f) Proving correctness. Compute an encrypted proof Cr, = Encpx,,., (TL2) 
for proving that (Zpep, (q1, ---+ qt), (Car3+-+,€q)) E€ Le. This is done by run- 
ning Evalpr,,, On input Zpep, (b1,..., be), (C1, -< <, Ce), (Y1; +s Ye): 

g) The complete message. Send m2 := (c, €y, h, (Pas, -- - , Par), Cni, ) to Pr. 
Notice that cq; is part of path(q:) which is contained in pq;. 

3. Verifying the second message m2. Pı decrypts c and obtains the result 
of the computation f(x,y) = DeCskeom(¢). For each i € [t] it also decrypts 
path(qi) = Decg,..(Pq;) and verifies that path(qi) is correct with respect to the 
root h. It then uses the leaves Cq,,---,Cq, and TL, = DEcskpro (Cri, ) together 
with the common reference string o and verifies the correctness of TL,. If all 
these checks succeed, then it outputs f(x,y), otherwise it aborts. 


Then we claim the following theorem, the proof can be found in |DFH11). 


Theorem 9 (Main). Assuming that Ig = (KeyGen, Enc, Dec, Eval, Refresh) is 
semantically secure, (CRSGen,P,V) is a non-interactive zero-knowledge proof, 
(ProVpep, (Verseps Verscp)) is a PCP system, {Hn}nen is collision-resistant and 
satisfies the EHF2 assumption, Protocol[3| evaluates f UC-securely against ma- 
lictous adversaries with polylogarithmic communication in the circuit-size of f. 


We give a brief overview of our proof. We distinct two corruption cases. Let P, be 
controlled by an adversary A. In this case we face the difficulty of protecting the 
privacy of Po, since revealing bits from I" so that the PCP verifier will be able to 
validate the proof is insecure. Loosely speaking, privacy follows due to hashing 
the committed proof rather than the proof itself. Thus, secrecy is obtained from 
the hiding property of the commitment scheme. Simulating A’s view requires 
from the simulator to verify the correctness of the message mı received from A 
as the honest P> would. Then it extracts A’s input, forwarding it to the trusted 
party. Finally, upon receiving from the trusted party f(x,y), it encrypts this 
value under Pkeomp and sends it back to A. Now, since the simulator does not use 
the real honest party’s input, y, it cannot construct a valid proof I’ and therefore 
has to build the hash tree on commitments to the zero string. It further simulates 
the NIZK proof for Lg. Indistinguishability follows due to: (1) Zero-knowledge 
property of the proof system of Lə. (2) Semantic security of Ig. (3) Refresh 
algorithm of Ig that produces a ciphertext indistinguishable from a ciphertext 
that encrypts f(x,y) directly (without going through homomorphic evaluation). 
(4) Soundness of the proof system of L. 

We now consider the case where P> is corrupt. Intuitively, security should 
follow from semantic security of encryptions under pk,,, soundness of the PCP 
and the fact that P> is committed to a PCP string via sending the root of the 
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Merkle tree: by soundness of the PCP, the only way P> could cheat would be 
to look at the encrypted PCP queries and adapt the PCP string it commits 
to, to the specific queries that are asked. Supposedly, this is not possible by 
semantic security. The technical difficulty, however, is that to have Pz help us 
conclude anything on which queries have been encrypted in a given ciphertext 
(to make a reduction to semantic security), we would need to see the responses 
P sends back. Unfortunately, these are encrypted under the same key pk,,, and 
if we want to do a reduction to semantic security, we cannot know skpro and so 
cannot see the responses directly. This is solved by first observing that by the 
extractability of the hash function, we can extract a Merkle tree 7 based on the 
root of the tree sent by Pz, and hence also a PCP string (we can assume we 
know sk, so we can decrypt the commitments containing PCP bits). We then 
show that the encrypted paths path(q;) must be contained in 7, or else we could 
break extractability or collision resistance of Hn. So the responses we want to see 
will be embedded in the tree we can extract. The reduction to semantic security 
can therefore ask for an encryption of one of two sets of queries q? or q!. It 
shows the ciphertext to P) and extracts a PCP string from the root sent by P2. 
Then if q? leads to accept with the extracted PCP P) would also accept in a 
real execution, so we guess that q? was the encrypted plaintext. 
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at least t < n servers need to contribute to the decryption process. A 
threshold primitive is said robust if no coalition of t malicious servers can 
prevent remaining honest servers from successfully completing private 
key operations. So far, most practical non-interactive threshold cryp- 
tosystems, where no interactive conversation is required among decryp- 
tion servers, were only proved secure against static corruptions. In the 
adaptive corruption scenario (where the adversary can corrupt servers 
at any time, based on its complete view), all existing robust threshold 
encryption schemes that also resist chosen-ciphertext attacks (CCA) till 
recently require interaction in the decryption phase. A specific method 
(in composite order groups) for getting rid of interaction was recently 
suggested, leaving the question of more generic frameworks and con- 
structions with better security and better flexibility (i.e., compatibility 
with distributed key generation). 


This paper describes a general construction of adaptively secure ro- 
bust non-interactive threshold cryptosystems with chosen-ciphertext se- 
curity. We define the notion of all-but-one perfectly sound threshold hash 
proof systems that can be seen as (threshold) hash proof systems with 
publicly verifiable and simulation-sound proofs. We show that this no- 
tion generically implies threshold cryptosystems combining the afore- 
mentioned properties. Then, we provide efficient instantiations under 
well-studied assumptions in bilinear groups (e.g., in such groups of prime 
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1 Introduction 


Threshold cryptography avoids single points of failure by splitting keys 
into n > 1 shares which are held by servers in such a way that at least t out of 
n servers should contribute to private key operations. In (t,n)-threshold cryp- 
tosystems, an adversary breaking into up to t — 1 servers should not jeopardize 
the security of the system. 

Chosen-ciphertext security [45] (or IND-CCA for short) is widely recognized 
as the standard security notion for public-key encryption. Securely distributing 
the decryption procedure of CCA-secure public key schemes has proved to be 
a challenging task. As discussed in, e.g., [49[25], the difficulty is that decryp- 
tion servers should return their partial decryption results, called “decryption 
shares”, before knowing whether the incoming ciphertext is valid or not and 
partial decryptions of ill-formed ciphertexts may leak useful information to the 
adversary. 

The first solution to this problem was put forth by Shoup and Gennaro [49] 
and it requires the random oracle model [5], notably to render valid cipher- 
texts publicly recognizable. In the standard model, Canetti and Goldwasser [15] 
gave a threshold variant of the Cramer-Shoup encryption scheme [I6]. Unfor- 
tunately, their scheme requires interaction among decryption servers to obtain 
robustness (i.e., ensure that no coalition of t — 1 malicious servers can prevent 
uncorrupted servers from successfully decrypting) as well as to render invalid 
ciphertexts harmless. The approach of consists in randomizing the decryp- 
tion process in such a way that partial decryptions of invalid ciphertexts are 
uniformly random and thus meaningless to the adversary. To avoid the need to 
jointly generate randomizers at each decryption, shareholders can alternatively 
store a large number (i.e., proportional to the expected number of decryptions) 
of pre-shared secrets, which does not scale well. Cramer, Damgard and Ishai 
suggested [20] a method to generate randomizers without interaction but it is 
only efficient for a small number of servers. 

Other threshold variants of Cramer-Shoup were suggested and Abe no- 
tably showed [I] how to achieve optimal resilience (namely, guarantee robustness 
as long as the adversary corrupts a minority of t < n/2 servers) in the Canetti- 
Goldwasser system [15]. In the last decade, generic constructions of CCA-secure 
threshold cryptosystems with static security were put forth [24152]. 


NON-INTERACTIVE SCHEMES. As an application of the Canetti-Halevi-Katz 
(CHK) paradigm [18], Boneh, Boyen and Halevi [8] came up with the first fully 
non-interactive robust CCA-secure threshold cryptosystem with a security proof 
in the standard model: in their scheme, decryption servers can generate their 
decryption shares without any communication with other servers. Their scheme 
takes advantage of bilinear maps to publicly check the validity of ciphertexts, 
which considerably simplifies the task of proving security in the threshold set- 
ting. In addition, the validity of decryption shares can be verified in the same 
way, which provides robustness. Similar applications of the CHK methodology 
to threshold cryptography were studied in [[3[36]. 
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Recently, Wee [52] defined a framework allowing to construct non-interactive 
threshold signatures and (chosen-ciphertext secure) threshold cryptosystems in 
a static corruption model. He left as an open problem the extension of his frame- 
work in the scenario of adaptive corruptions. 


ADAPTIVE CORRUPTIONS. Most threshold systems (including [49051242518] ) 
have been analyzed in a static corruption model, where the adversary chooses 
which servers it wants to corrupt before the scheme is set up. Unfortunately, 
adaptive adversaries — who can choose whom to corrupt at any time, as a function 
of their entire view of the protocol execution — are known (see, e.g., [19]) to be 
strictly stronger. As discussed in [15], properly dealing with adaptive corruptions 
often comes at some substantial expense like a lower resilience. For example, the 
Canetti-Goldwasser system can be proved robust and adaptively secure when 
the threshold t is sufficiently small (typically, when t = O(n'/2)) but supporting 
an optimal number of faulty servers is clearly preferable. 

Assuming reliable erasures, Canetti et al. [I4] devised adaptively secure pro- 
tocols for the distributed generation of discrete-logarithm-based keys and DSA 
signatures. Their techniques were re-used later on [3] in proactive [44] RSA sig- 
natures. In 1999, Frankel, MacKenzie and Yung [2627] independently showed 
different methods to achieve adaptive security in the erasure-enabled setting. 

Subsequently, Jarecki and Lysyanskaya [34] eliminated the need for erasures 
and gave an adaptively secure variant of the Canetti-Goldwasser threshold cryp- 
tosystem which appeals to interactive zero-knowledge proofs but is designed 
to remain secure in concurrent environments. Unfortunately, their scheme re- 
quires a fair amount of interaction among decryption servers. Abe and Fehr [2] 
showed how to dispense with zero-knowledge proofs in the Jarecki-Lysyanskaya 
construction so as to prove it secure in (a variant of) the universal composability 
framework but without completely eliminating interaction from the decryption 
procedure. As in most threshold variants of Cramer-Shoup, hedging against in- 
valid decryption queries requires an interactive (though off-line) randomness 
generation phase for each ciphertext, unless many pre-shared secrets are stored. 

Recently, the authors of this paper showed [39] an adaptively secure variant of 
the Boneh-Boyen-Halevi construction |8| using groups of composite order and the 
dual system encryption approach [5038] that was initially applied to identity- 
based encryption [48]10]. The scheme of [39] is based on a very specific use of the 
Lewko-Waters techniques [38], which limits its applicability to composite order 
groups and makes it hard to combine with existing adaptively secure distributed 
key generation techniques. Also, the concrete security of this initial scheme is not 
optimal as its security reduction is related to the number of decryption queries 
made by the adversary. To solve these problems, we need a new approach and 
different methods to analyze the security of schemes. 


OUR CONTRIBUTION. Motivated by an open question raised by Wee and 
the limitations of [39], we define a general framework for constructing robust, 
adaptively secure and fully non-interactive threshold cryptosystems with chosen- 
ciphertext security. Our goal is to have simple and practical client/server 
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protocols, as advocated in [49][Section 2.5], and even avoid the off-line interac- 
tive randomness generation stage which is usually needed in threshold versions 
of Cramer-Shoup. 

To this end, we also appeal to hash proof systems (HPS) [I7] and take advan- 
tage of the property that, in security reductions using the techniques of [I6[17], 
the simulator knows the private keys, which is convenient to answer adaptive 
corruption queries. Indeed, when the reduction has to reveal the internal state 
of dynamically-corrupted servers, it is not bound to a particular set of available 
shares since it knows them all. At the same time, we depart from [I5] in that the 
validity of ciphertexts is made publicly verifiable — which eliminates the need 
to randomize the decryption operation — using non-interactive proofs satisfying 
some form of simulation-soundness [46]: in the security reduction, the simula- 
tor should be able to generate a proof for a possibly false statement but the 
adversary should be unable to do it on its own, even after having seen a fake 
proof. 

To this end, we define the notion of all-but-one perfectly sound threshold hash 
proof systems that can be seen as (threshold) hash proof systems with pub- 
licly verifiable proofs (as opposed to designed-verifier proofs used in traditional 
HPS [I7]). More precisely, each proof is associated with a tag, in the same way 
as ciphertexts are associated with tags in [4136]. Real public parameters are in- 
distinguishable from alternative parameters that are generated in an all-but-one 
mode, which is only used in the security analysis. In the latter mode, non- 
interactive proofs are perfectly sound on all tags, except for a single specific tag 
where some trapdoor makes it possible to simulate proofs for false statements. 
While our primitive bears similarities with Wee’s extractable hash proof systems 
[5152] (where hash proof systems are also associated with tags), it is different in 
that no extractability property is required and proofs are always used as proofs 
of membership. 

Using all-but-one perfectly sound threshold hash proof systems, we generically 
construct adaptively secure robust non-interactive threshold cryptosystems with 
optimal resilience. An additional benefit of this approach is to provide a better 
concrete security as the security proof requires a constant number of game tran- 
sitions whereas, in [89], the number of games is proportional to the number of 
decryption queries. 

Then, we show three concrete instantiations using number theoretic assump- 
tions in bilinear groups. The first one uses groups whose order is a product of 
two primes (whereas three primes are needed in [39]). Our second and third 
schemes rely on the Groth-Sahai proof systems [3I] in their instantiations based 
on the Decision Linear [9] and symmetric eXternal Diffie-Hellman assumptions 
47|. The latter two constructions operate over bilinear groups of prime order, 
which allows for a significantly better efficiency than composite order groups (as 
discussed in [28]) and makes them much easier to combine with known adaptively 
secure discrete-log-based distributed key generation protocols. For example, in 
the erasure-free setting, the protocols of [342] can be used so as to eliminate the 
need for a trusted dealer at the same time as the reliance on reliable erasures. 
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2 Background and Definitions 


2.1 Definitions for Threshold Public Key Encryption 


A non-interactive (t, n)-threshold encryption scheme is a set of algorithms with 
these specifications. 


Setup(à, t, n): given a security parameter À and integers t,n € poly(A) (with 
1 < t < n) denoting the number of decryption servers n and the threshold t, 
this algorithm outputs (PK, VK,SK), where PK is the public key, SK = 
(SKı,...,SKn)is a vector of private-key shares and VK = (V K1,..., V Ky) 
is a vector of verification keys. Decryption server 7 is given the private key 
share (i, SK;). For each i € {1,...,n}, the verification key V K; will be used 
to check the validity of decryption shares generated using SK;. 

Encrypt(PK, M): is a randomized algorithm that, given a public key PK and 
a plaintext M, outputs a ciphertext C. 

Ciphertext-Verify(PK,C): takes as input a public key PK and a ciphertext 
C. It outputs 1 if C is deemed valid w.r.t. PK and 0 otherwise. 

Share-Decrypt(PK,i,SK;,C): on input of a public key PK, a ciphertext C 
and a private-key share (i, SK;), this (possibly randomized) algorithm out- 
puts a special symbol (i, L) if Ciphertext-Verify(PK,C) = 0. Otherwise, 
it outputs a decryption share u; = (i, ĝi). 

Share-Verify(PK,V K;,C, pi): takes in PK, the verification key V Kj, a ci- 
phertext C and a purported decryption share u; = (i, ĝi). It outputs either 
1 or 0. In the former case, pu; is said to be a valid decryption share. We adopt 
the convention that (i, L) is an invalid decryption share. 

Combine(PK, VK, C, {wities): given PK, VK, C and a subset S C {1,...,n} 
of size t = |S| with decryption shares {ui }ics, this algorithm outputs either 
a plaintext M or if the set contains invalid decryption shares. 


CHOSEN-CIPHERTEXT SECURITY. We use a game-based definition of chosen- 
ciphertext security which is akin to the one of [49]8| with the difference that the 
adversary can adaptively decide which parties it wants to corrupt. 


Definition 1. A non-interactive (t,n)-Threshold Public Key Encryption scheme 
is secure against chosen-ciphertext attacks (or IND-CCA2 secure) and adaptive 
corruptions if no PPT adversary has non-negligible advantage in this game: 


1. The challenger runs Setup(A,t,n) to obtain PK, a vector of private key 
shares SK = (SKj,...,SK,,) and verification keys VK = (V K1,..., V Ky). 
It gives PK and VK to the adversary A and keeps SK to itself. 
2 The adversary A adaptively makes the following kinds of queries: 
- Corruption query: A chooses i € {1,...,n} and obtains SK;. No more 
than t — 1 private key shares can be obtained by A in the whole game. 
- Decryption query: A chooses an index i € {1,...,n} and a ciphertext C. 
The challenger replies with ui = Share-Decrypt(PK,i,SK;j,C). 
3. The adversary A chooses two equal-length messages Mo, Mı and obtains 
C* = Encrypt(PK, Mg) for some random bit 8 & {0,1}. 
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4. A makes further queries as in step 2 but is not allowed to make decryption 
queries on C™. 

5. A outputs a bit B’ and is deemed successful if 8’ = 8. As usual, A’s advantage 
is measured as the distance Adv(A) = | Pr[8’ = 6] — 5}. 


CONSISTENCY. A (t,n)-Threshold Encryption scheme provides decryption con- 
sistency if no PPT adversary has non-negligible advantage in a three-stage game 
where stages 1 and 2 are identical to those of Definition [I] with the difference 
that the adversary A is allowed to obtain all private key shares (alternatively, 
A can directly obtain SK at the beginning of the game). In stage 3, A out- 
puts a ciphertext C and two t-sets of decryption shares I = {p1,..., Ht} and 
I’ = {ph,...,,}. The adversary A is declared successful if 


1. Ciphertext-Verify(PK,C) = 1. 
2. I and I” only consist of valid decryption shares. 
3. Combine(PK, VK, C,T) 4 Combine(PK, VK,C,I”). 


We note that condition 1 prevents an adversary from trivially winning by out- 
putting an invalid ciphertext, for which distinct sets of key shares may give 
different results. This definition of consistency is identical to the one of [49]8] 
with the difference that A can adaptively corrupt servers. 


2.2 Hardness Assumptions in Composite Order Groups 


In one occasion, we appeal to groups (G, Gr) of composite order N = pipo, 
where pı and po are primes, with a bilinear map e : Gx G > Gr (i.e., for which 
e(g*,h®) = e(g,h)® for any g,h € G and a,b € Zy). In the notations hereafter, 
for each i € {1,2}, Gp, stands for the subgroup of order p; in G. 


Definition 2 ([Ii]). In a group G of composite order N, the Subgroup Deci- 
sion (SD) problem is given (g € Gp,, h E€ G) and n, to decide whether n € Gp, 
or n Er G. The Subgroup Decision assumption states that, for any PPT 
distinguisher D, the SD problem is infeasible. 


2.3 Assumptions in Prime Order Groups 


We also use bilinear maps e : G x G — Gr over groups of prime order p. We 
will work in symmetric pairing configurations, where G = G, and sometimes in 
asymmetric configurations, where G 4 G. 

In the symmetric setting (G, Gr), we rely on the following assumption. 


Definition 3 ([9]). In a group G of prime order p, the Decision Linear 
Problem (DLIN) is to distinguish the distributions (g, 9%, g°, 9%, g°4, g°+2) and 
(9,9%,9°,9°°, 9°", 97), with a,b,c,d,z & Zp. The Decision Linear Assump- 
tion is the intractability of DLIN for any PPT distinguisher D. 
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The problem amounts to deciding if vectors gi = (g*,1,9), g2 = (1,g°,g) and 
gs = (9%, 9°%, g?) are linearly dependent (i.e., if 6 = c + d) or not. 

In asymmetric bilinear groups (G, G Gr), we assume the hardness of the De- 
cision Diffie-Hellman (DDH) problem in G and G. This implies the unavailabil- 
ity of efficiently computable isomorphisms between G and G. This assumption 
is called Symmetric eXternal Diffie-Hellman (SXDH) assumption. Given 
vectors i = (g, h), T2 = (g*, h°) in G? or Ĝ?, the SXDH assumption asserts the 
infeasibility of deciding whether tı and üz are linearly dependent (i.e., whether 
a = c mod p). 


3 All-But-One Perfectly Sound Threshold Hash Proof 
Systems 


Let C, K and K’ be sets and let V C C be a subset. Let also R be a space where 
random coins can be chosen. We mandate that V, K, K’ and R be of exponential 
size in A, where À € N is a security parameter. In addition, C, V and C\V should 
be efficiently samplable and we also require the set K to form a group for some 
binary operation, which is denoted by © hereafter. 

An all-but-one perfectly sound threshold hash proof system for (C, V, K, K’, R) 
is a tuple of algorithms (SetupSound, SetupABO, Sample, Prove, SimProve, Verify, 
PubEval, SharePrivEval, ShareEvalVerify, Combine) of efficient algorithms with the 
following specifications. 


SetupSound(\,¢,n): given a security parameter A € N and integers t,n € 
poly(A), this algorithm outputs a public key pk, a vector of private key shares 
(skj,...,Skp) and verification keys (vki,...,vkn). 

SetupABO(), t,n, tag”): takes as input a security parameter A € N, integers 
t,n € poly(A) and a tag tag*. It outputs a public key pk, private key shares 
(ski,...,Sk,), the corresponding verification keys (vki,...,vk,) as well as a 
simulation trapdoor 7. It is important that r be independent of {sk;}?_,. 

Sample(pk): is a probabilistic algorithm that takes as input a public key pk. It 
draws random coins r & R and outputs an element ® € V along with the 
random coins r that will serve as a witness explaining @ as an element of V. 

Prove(pk, tag,r,®): takes in a public key pk, a tag tag, an element ® € V and 
the random coins r € R that were used to sample ®. It generates a non- 
interactive proof my that $ € V. 

SimProve(pk, 7, tag, 8): takes as input a public key pk and a simulation trapdoor 
T produced by SetupABO(A,t,n,tag*), a tag tag and an element @ € C. If 
tag # tag”, the algorithm outputs L. If tag = tag*, the algorithm produces 
a simulated NIZK proof my that $ € V. 

Verify (pk, tag, ®, ty): takes as input a public key pk, a tag tag, an element ® € C 
and a purported proof my. It outputs 1 if and only if my is deemed as a valid 
proof that BE VCC. 

PubEval(pk,7,@): takes as input a public key pk, an element ® € V and the 
random coins r ER R such that (r,®) + Sample(pk). It outputs a value 
K € K, which is called public evaluation of ®. 
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SharePrivEval(pk, sk;,®): is a deterministic algorithm that takes in a public key 
pk, a private key share sk; and an element @ € C. It outputs a value K; € 
K’, called private evaluation share and a proof 7K, that K; was evaluated 
correctly. 

ShareEvalVerify(pk, vk;,®, K;,7,): given a public key pk, a verification key 
vk;, an element ® € C, a private evaluation share K; € K’ and its proof 7x,, 
this algorithm outputs 1 if mg, is considered as a valid proof of the correct 
evaluation of K;. Otherwise, it outputs 0. 

Combine(pk, 8, {(K;,7K,)}ies): takes as input a public key pk, an element 
® € C and a set of t pairs {(Ki,7xK,)}ies, where S C {1,...,n}, each 
one of which consists of a private evaluation share K; € K’ and its proof 
Tg. If ShareEvalVerify(pk, vki, ®, Ki, Tg) = 0 for some i € S, it outputs L. 
Otherwise, it outputs a value K € K. 


We also define this algorithm which is implied by the above ones but will be 
convenient to use. 


PrivEval(pk, {sk;}ies,®): given a public key pk, a set of private key shares 
{ski}ieg where S is an arbitrary t-subset of {1,...,n}, and an element @ € C, 
this algorithm outputs the result of Combine(pk, 8, {(Ki,7K,)}ies) where 
(Ki, 7K, ) < SharePrivEval(pk, sk;,®) for each i € S. 


The following properties are required from these algorithms and the sets 
(C, V, K, K’, R). 


(SETUP INDISTINGUISHABILITY): For any integers (A, t, n) with 1 < t < n and 
any tag tag*, the output of SetupSound(A, t, n) is computationally indistin- 
guishable from the outputs (pk, {sk;}?_,, {vk;}7_,) of SetupABO(A, t, n, tag*). 

(CORRECTNESS AND PUBLIC EVALUABILITY ON V): For any (pk, {sk;}7,, 
{vk;}”_,) returned by SetupSound or SetupABO, if (r, 8)  Sample(pk) (and 
thus @ € V), it holds that: 

1. For any i € {1,... n}, if (Ki, mx, ) +} SharePrivEval(pk,sk;,®), the pri- 
vate evaluation share K; € K’ is uniquely determined by (pk, vk;) and &. 
Moreover, the proof mg, satisfies ShareEvalVerify(pk, vki, ®, Ki, 7,) = 1. 

2. For any t-subset S C {1,...,n}, combining the corresponding private 
evaluation shares allows recomputing the public evaluation of &: namely, 
PubEval(pk, r, &) = PrivEval(pk, {sk;}ies, 8). 

(UNIVERSALITY): For any (pk, {ski}/1, {vki}/_,) produced by SetupSound or 
SetupABO and any @ € C\\V, for any subset S C {1,...,n} of size |S| = t—1, 
the statistical distance 


A[(pk, {vki}; 1, {ski hies, ®, PrivEval(pk, {sk;} 1, ®)), 
(pk, {vki fi, {ski}ies, p, K)], 


where K & K, should be negligible. 
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(ALL-BUT-ONE SOUNDNESS): For all integers (A, t, n) such that 1 < t < n, any 
tag tag* and any outputs (pk, {sk;}7_,, {vk;}"_,,7) of SetupABO(), t, n, tag”), 
these conditions are satisfied. 


1. For any tag Æ tag”, proofs are always perfectly sound. Namely, if a proof 
my satisfies Verify(pk, tag, ”, ry) = 1 for some ® € C, then it necessarily 
holds that @ € V. 


2. For any ® € C, the trapdoor 7 allows generating as simulated a proof 
my < SimProve(pk, T, tag*,&) such that Verify(pk, tag*, 8, ny) = 1 (note 
that zy is a proof for a false statement if  € C\V). Moreover, if P € V, 
the simulated proof my should be perfectly indistinguishable from a real 
proof (i.e., that would be generated by Prove using a witness r € R of 
the fact that ® € V). 


(SIMULATABILITY OF SHARE PROOFS): For all (A, t, n) with 1 < t < n, any tag 
tag*, any outputs (pk, {sk;}7_,, {vk;}",,7) of SetupABO(), t, n, tag*) and 
any ® € C, the proofs mx, obtained as (Kj, 7K,) 4 SharePrivEval(pk, sk;, 8) 
should be simulatable using the trapdoor 7 instead of {sk;}7_,. Using 7 and 
(pk, {vk;}"_,, 8), an efficient algorithm S should be able to produce simulated 
proofs 7x, that are perfectly indistinguishable from real proofs. 


(CONSISTENCY): For all (A, t, n) with 1 < t < n, any output (pk, {(vk;, sk;)}”_,) of 
SetupSound(A, t, n), given (pk, {(vk;, sk;)}7_,), it should be computationally 
infeasible to come up with a triple (tag, ®, 7) as well as two distinct t-sets 
r= {(Ki, TK; Jade (Kio 7K, ) } and I” = (Kio TE, Jyt (Kio Tk) D 
with ik, je E€ {1,...,n} for each k € {1,...,t}, such that the following 
three conditions are satisfied: (i) Verify(pk, tag, 8, ny) = 1; (ii) for each 
k € {1,...,t}, it holds that ShareEvalVerify(pk, vki, , 8, Ki, TK, ) = 1 and 
ShareEvalVerify(pk, vkj, , ®, Kio Tk, ) = 1; (ii) I and I” result in distinct 


combinations: Combine(pk, 8, T) Æ Combine(pk, &, I”). 


(SUBSET MEMBERSHIP HARDNESS): membership in C should be easy to check 
but membership in V should not. Moreover, this should hold even if 7 is 
given. Namely, for all integers (A,¢,n) such that 1 < t < n, any tag tag* 
and any outputs (pk, {sk;}?_,, {vki}?"_,, 7) of SetupABO(A, t, n, tag”), for any 
PPT distinguisher D, it must hold that: 


Adv®™(D) = |Pr[D(C, V, Ci,7) = 1|C1 € C\V] 
— Pr[D(C,V, Co, T) = 1|Co = V]| € negl(A). 


In the definition of the subset membership hardness property, the trapdoor T 
should not carry any side information helping the distinguisher. For this reason, 
the latter receives 7 as part of its input. 
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4 Adaptively Secure Robust Non-interactive 
CCA2-Secure Threshold Cryptosystems from 
All-But-One Perfectly Sound Threshold Hash Proof 
Systems 


Let us assume sets (C, V, K, K’, R) for which we have an all-but-one perfectly sound 
threshold hash proof system J7480-THPS — (SetupSound, SetupABO, Sample, Prove, 
SimProve, Verify, PubEval, SharePrivEval, ShareEvalVerify, Combine) that satisfies 
the conditions specified in Section [B] We assume that messages are in K. The 
generic construction of CCA2-secure threshold cryptosystem goes as follows. 


Keygen(,,¢,): given integers A,t,n € N, choose a one-time signature scheme 
X = (Gen, Sig, Ver), generate (pk, {sk;}7_,, {vk;}?_,) 4} SetupSound(A, t, n) 
and output (PK,SK, VK), where the vectors of private key shares and ver- 
ification keys are defined as SK = (ski,...,sk,) and WK = (vkj,...,vk,), 
respectively. The public key is PK = (pk, X). 

Encrypt(M, PK): to encrypt a message M € K using PK = (pk, X), 

1. Generate a one-time signature key pair (SSK, SVK) + ¥’.Gen(A). 

2. Choose r & R, compute (r,&) + Sample(pk,7r) and blind the message 
as Co = M © PubEval(pk, r, 8). 

3. Generate a proof my < Prove(pk, SVK,7r,®) that P € V with respect to 
the tag SVK. 

4. Output C = (SVK, Co, 8, my, o), where o = X.Sig(SSK, (Co, 8, my)). 


Ciphertext- Verify (PK, C): parse the ciphertext C as C = (SVK, Co, ®, my, 0) 
and PK as (pk, X). Return 1 if it holds that Y.Ver(SVK, (Co, ®, my),7) = 1 
and Verify(pk, SVK, 8, mp) = 1. Otherwise, return 0. 


Share-Decrypt(SK;,C): given SK; = sk; and C = (SVK,Co,®@,7y,¢), re- 
turn (i, L) if it turns out that Ciphertext-Verify (PK, C) = 0. Otherwise, 
compute a pair (Kj, 7K,) < SharePrivEval(pk, sk;,®) and return u; = (i, jus) 
where ji; = (Ki, 7K,). 

Share-Verify (PK, VK;i,C, (i, ûi)): parse C as (SVK, Co, 8, ny, 0). If fi = L 
or if si; cannot be properly parsed as a pair (K;,7x,), return 0. Otherwise, 
return 1 if ShareEvalVerify(pk, vk;, ®, Ki, 7x, ) = 1 and 0 otherwise. 

Combine(PK, VK, C, {(i, fi)}ies): parse C as (SVK, Co, 8, nry, o). Return L 
if there exists i € S such that Share-Verify (PK, C, (i, i) = 0 or if 
Ciphertext-Verify (PK ; C) = 0 . Otherwise, compute the combined value 
K = Combine(pk, 8, {(Ki,7x;)}ies) € K, which unveils M = Co © K7!. 


We observe that there is no need to bind the one-time verification key SVK to 
the ciphertext components (Co, ®, ry) in any other way than by using it as a tag 
to compute the non-interactive proof my. Indeed, if the adversary attempts to 
re-use parts (Cj, &*, 7},) of the challenge ciphertext and simply replaces the one- 
time verification key SVK* by a verification key SVK of its own, it will be forced 
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to compute a proof ty that correspond to the same & as in the challenge phase 
but under the new tag SVK. Our security proof shows that this is infeasible 
as long as JJ^BO-THPS satisfies the properties of setup indistinguishability and 
all-but-one soundness. 

The consistency property of the threshold encryption scheme is trivially im- 
plied by that of JJ^BO-THPS and we focus on proving its IND-CCA security. In 
the threshold setting, adaptive security is achieved by taking advantage of the 
fact that, in security reductions using hash proof systems, the simulator typically 
knows the private key and can thus answer adaptive queries at will. At the same 
time, invalid ciphertexts are harmless as they are made publicly recognizable 
due to the use of non-interactive proofs of validity: as long as these proofs are 
perfectly sound in all decryption queries, the simulator is guaranteed not to leak 
too much information about the particular private key it is using. 

The main problem to solve is thus to make sure that only the simulator can 
simulate a fake proof in the challenge phase and this is where the all-but-one 
soundness property is handy. 


Theorem 1. The above threshold cryptosystem is IND-CCA secure against adap- 
tive corruptions assuming that: (i) ITAPO-THPS is an all-but-one perfectly sound 
hash proof system; (ii) X is a strongly unforgeable one-time signature. 


Proof. The proof is given in the full version of the paper. 


5 Instantiations 


5.1 Construction in Groups of Composite Order N = pipe 


The construction relies on a hash proof system in a group G of composite order 
N = pip and it is conceptually close to the one in [33] (notably because it builds 
on a log p2-entropic hash proof system, as defined in ). The public key includes 
group elements (g, X = g”) in the subgroup Gp, of order pı and the sets C and 
V are defined to be G and Gp,, respectively. The sampling algorithm returns 
® = g” € Gp, for a random exponent r & Zy, which allows publicly evaluating 
H(X") = H(®*) using a pairwise independent hash function H : G — {0,1}*. 
Since the public key is independent of x mod po, for any ® € G that has a 
non-trivial component of order p2, the “hash value” &* has exactly log pə bits 
of min-entropy and the leftover hash lemma implies that H(®*) is statistically 
close to the uniform distribution in {0,1} when £ is sufficiently small. 

In order to turn the scheme into an all-but-one perfectly sound threshold 
HPS, we need a mechanism that proves membership in the subgroup Gp, and 
guarantees the perfect soundness of proofs of membership for all tags tag € Zn 
such that tag # tag*. To this end, we use additional public parameters (u, v) € 
G? and a tag-dependent group element ut?8 - v will serve as a common reference 
string to generate a non-interactive proof that P € Gp,. Membership in Gp, 
can be non-interactively proved using a technique that can be traced back to 
[30]. The proof consists of a group element msp € G satisfying the equality 
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e(®, u*8-v) = e(g, tsp), which ensures that  € Gp, as long as u&-v has a Gp, 
component. In the public parameters produced by SetupABO, the value u™8 - v 
thus has to be in G\G,, for any tag # tag* in such a way that generating fake 
proofs that ® € Gp, is impossible. At the same time, u'8" . y should be in Gp, 
so that fake proofs can be generated for tag*. 


SetupSound(), t, n): choose a group G of composite order N = pp for large 
primes p; > 2!) for each i € {1,2} and for some polynomial | : N > N. 
Then, conduct the following steps 


1. Pick g & Gp, u,v & G, x & Zy and set X = g” € Gp. 

2. Choose a random polynomial P|X] € Zy[X] of degree t — 1 such that 
P(0) = z. For each i € {1,...,n}, compute Y; = g? €G,,. 

3. Select a pairwise independent hash function H : G — {0,1}“, where 
L < L(A) — 2X. Note that the range K = {0,1} of H forms a group for 
the bitwise exclusive OR operation © = ©. 

4. Define private key shares (sk,,...,sk,) as sk; = P(t) € Zy for each i = 1 
to n. The vector (vki,...,vk,,) is defined as vk; = Y; € Gp, for each i and 
the public key consists of pk = ((G,Gr), N,g,X,u,v,H). In addition, 
we have (C, V, K, K’, R) = (G, Gp, {0,1}, G, Zw). 


SetupABO(\,t,n,tag*): is like SetupSound with the difference that, instead of 
being chosen uniformly in G, v is defined as v = u~*8" - g% for some random 
a € Zn. The algorithm also outputs the simulation trapdoor T = a € Zy. 
Sample(pk): parse the public key pk as ((G,Gr), N,g,X,u, v, H). Choose r & 
Zn, compute ® = g” € Gp, and output the pair (r, 8) € Zy x Gp- 
Prove(pk, tag, r, ®): parse pk as ((G, Gr),N,g,X, u,v, H) and return | if BF 
g”. Otherwise, compute and return Tsp = (u*8 . v)”. 
SimProve(pk,7, tag, &): return L if tag 4 tag* or if 6 ¢ G. Otherwise, use the 
simulation trapdoor T = a € Zy to compute and output msp = &*. 
Verify(pk, tag, ®, tgp): return 1 iff ($, rsp) € G? and e(#, u8 - v) = e(g, Tsp). 
PubEval(pk,7,@): on input of the public key pk = ((G,Gr), N,g,X,u,v, H), 
return L if (r,®) ¢ Zy x G. Otherwise, output K = H(X”) € {0,1}*. 
SharePrivEval(pk, sk;,®): return L if P ¢ G. Otherwise, compute and return 
(K;,7K,), where K; = ®* = PPO and rg, = e is simply the empty string. 
ShareEvalVerify (pk, vk;, 8, K;,7K,): if Ki Z G, vk; ¢ G or mK, # €, return 0. 
Otherwise, return 1 if e(g, K;) = e(®,vk;). In any other situation, return 0 
(the proof 7, is ignored in this instantiation since, given key vk; = Yj, the 
private evaluation share K; is directly verifiable). 
Combine(pk, 8, {(Ki,7K,)}ies): return L if there exists an index i € S such 
that ShareEvalVerify(pk, vk;, ®, Ki, 7x, ) = 0. Otherwise, compute and output 


K = Hes Ky *°) = H(@*) EK. 
Theorem 2. The above construction is an all-but-one perfectly sound threshold 
hash proof system if the SD assumption holds in G. (The proof is given in the 
full version of the paper). 


Non-interactive CCA-Secure Threshold Cryptosystems 87 


When the above all-but-one perfectly sound threshold hash proof system is 
plugged into the generic construction of Section [4] the resulting threshold cryp- 
tosystem bears resemblance with the scheme in , which makes use of groups 
whose order is a product of three primes. However, it is more efficient and its 
security proof is completely different as the dual system encryption approach 
[50] is not used here. 


5.2 Construction in Prime Order Groups 


This section presents an all-but-one threshold hash proof system based on the DLIN 
assumption in prime order bilinear groups. The public key comprises elements 
(9, 91,92, X1, X2) € GË, where X; = gi! -g*, X2 = g5? -g* and (21, £2, z) are part 
of the private key. The sets C and V C C consist of C = G? and V = {(®1, B2, Bs) = 
(98, 98? , 9 ¥92) | 01,02 € Zp}, respectively. For any P = (1, P2, 83) E V, the 
public evaluation algorithm computes X 4 TX. s ? which can be privately evaluated 
as PI- D3- D5. 

As in the previous instantiation, we append to elements ® € V a non-interactive 
proof of their membership of V (i.e., a proof that (g, g1, 92, £1, 2, B3) is a linear 
tuple) and, in this case, the proof is obtained using the Groth-Sahai techniques. 
However, we cannot simply combine them with a DLIN-based hash proof system 
in the obvious way. The reason is that, using parameters produced by SetupABO 
and under the special tag tag*, SimProve must be able to compute a fake non- 
interactive proof of the statement ® € V for an element ® ¢ V. At the same time, 
we should make sure that, for any tag such that tag Æ tag*, it will be impossible 
to simulate such proofs. To solve this problem, we need a form of one-time sim- 
ulation soundness [46] which can be possibly obtained from Groth’s simulation- 
sound non-interactive proofs [29] or a more efficient variant suggested by Katz 
and Vaikuntanathan [35]. However, the specific language that we consider allows 
for even more efficient constructions: it is actually possible to build on the Groth- 
Sahai proofs essentially without any loss of efficiency. 

The solution is as follows. After having sampled a tuple ® = (81, P2, P3) € 
Y, the sampler generates his proof using a Groth-Sahai CRS that depends on 
tag. Algorithm SetupABO produces parameters in the fashion of the all-but- 
one technique [7]: the tag-based CRS is perfectly WI on the special tag tag* 
(which allows generating NIZK proofs for this tag) and perfectly sound for any 
other tag, which makes it impossible to convincingly prove false statements on 
tags tag Æ tag*. Malkin, Teranishi, Vahlis and Yung [42] used a similar idea 
of message-dependent CRS in the context of signatures. A difference with [42] 
is that we do not need to extract witnesses from adversarially-generated proofs 
and only use them as proofs of membership. 

Interestingly, the same technique can be applied to have a more efficient 
simulation-sound proof of plaintext equality in the Naor-Yung-type [43] cryp- 
tosystem in [35][Section 3.2.2]: the proof can be reduced from 60 to 22 group 
elements and the ciphertext size is decreased by more than 50%. 
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SetupSound(X,t,n): Choose a group G of prime order p > 2” with generators 
9, 91, 92; fi, f2 EG. 
1. Choose £1, %2,7 & Zp and set Xı = gi!g*, X2 = g57g*. Define the 
vectors gi = (g1,1,g) and g2 = (1, g2,g). Then, pick ¿1,2 © Zp and 
define 93 = gi! - pe. 7 2 
2. Choose ¢1, ¢2 © Zp and define vectors fı = (f1,1,9), f2 = (1, fe,g) and 


=> 


f= a Ta -(1, 1,9). 

3. Choose random polynomials P; [X], P2[X], P[_X] € Zp[X] of degree t — 1 
such that P; (0) = zı, P2(0) = z2 and P(0) = z. For each i = 1 to n, 
compute Yi1 = gi g?, Vig = gh? OgP. 

4. Define shares SK = cm sk n) as sk; = (P (i), Po(i), P(i)) € (Zp)? 
for each i € {1,...,n}. Weciadation keys VK = (vki,...,vk,) are defined 
as vk; = (Yi,1, Yi,2) € G? for each i € {1,...,n} and the public key is 


pk = ((G,Gr), 9, Fir R B, fi, fo fa, X, Xa). 


As for the sets (C,K,K’,R), they are defined as C = G?, K = K' = G 
and R = (Z,)”, respectively. The subset V C C consists of the language 
(1, B2, B3) € G? for which there exists 01,02 € Zp such that ı = a 
P = oe and @3 = gt, 

SetupABO(), t, n, tag”): is like SetupSound with the following differences. 

1. In step 1, g3 is set as gh = Ji- g5- (1,1, g)~ 2" so that 93 g span( $i, g2). 
2. In step 2, the vectors (fi, Jo fs) are chosen so as to have B = p“ Be, 
3. The algorithm also outputs the trapdoor T = (£1, £2, 61, $2) € (Zp). 

Sample(pk): choose 61,62 € Zp, compute & = (S1, 82, 83) = (99, g8? , g™+%) 
and output ((61, 02), 8). 

Prove(pk, tag, (01, 02), 8): parse pk as ((G, Gr), g, 91, 9, B fi, fo fa, Xa, X2). 
Parse & as (1, P2, B3). Defind! Grag = 93° (1, 1, g)°8 and use Stag = (9i, 92, Frag) 
as a Groth-Sahai CRS to generate a NIZK proof that (9, J1, 92, B1, B2, B3) is 
a linear tuple. More precisely, generate Ce Ca. Co, to exponents 
01,02 € Zp (in other words, compute Co, = = ese? G+ g” with rj, si & Zp 
for each į € {1,2}) and a proof 7g, 9,) that they satisfy 


pı = g”, 2 = gP, P; = oe, (1) 


The whole proof mrn for consists of Cy; Ca; and 7(6,,9,) (see the full 
version of the paper for details about the generation of this proof) and 
requires 12 elements of G. 

SimProve(pk, 7, tag, ®): parses pk as above, T as (£1, £2, 1, ¢2) € (Z,)* and & 
as (B1, 2,63) € G®. If tag Æ tag*, return L. Otherwise, the commitments 
Co, , Co, and the proof mun must be generated for the Groth-Sahai CRS 


1 We assume that tags are non-zero. This can be enforced by having Prove and Verify 
output L when tag = 0. 
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Stags = (Ji, 92; tage), Where tage = 93° (1, 1, g) = = gi! - gë, which is a 
Groth-Sahai CRS for the witness indistinguishability setting. 

1. Using the trapdoor (£1, £2), simulate proofs for multi-exponentiation 
equations (see the full version of the paper for details as to how such 
proofs can be simulated). That is, generate Co, , Co, as commitments to 
0 and compute ™ (01,02) as a simulated proof that relations (i) hold. 

2. Output TLIN = (Co, , Con, T 7(6,,02)) that consists of perfectly hiding com- 
mitments and simulated NIZK proofs which, on the CRS (gi, 92, ftag), 
are distributed as real proofs. 

Verify (pk, tag, ®, TLIN): parse pk and ® as above. Also, parse the proof TLIN 
as (Co, Cons T (61 ,02)) E G!?. Then, compute Gag = 93: (1,1, g)°E and use 
Stag = Gh) as a Groth-Sahai CRS to verify mur. If the latter is 
deemed as a valid proof for the relations (I), return 1. Otherwise, return 0. 

PubEval(pk, (01,02), 8): parse pk and @ as above. Return L if (81, 82,83) Æ 
(g%, 962, 9° +). Otherwise, compute and return K = Xf! . X €K. 

SharePrivEval(pk, sk;,®): parse sk; as (P; (i), Po(i), P(i)) € (Zp)? and Se L 
if B Z G3. Otherwise, return (Ki, tK; ), where K; = Gr gPO gre Lie i 
and mx, = (Cp,,Cp,,Cp, ™x,) € G” is a proof consisting of commitments 
Cp,, Cp,, Cp to exponents P; (i), P2(i), P(i) € Zp and a proof 7, that these 
satisfy the equations 

Ki = a : os ao, Yi = gO gP®, Yi2= gg, (2) 
The perfectly binding commitments Č Pi; 6) Po; Cp and the proof TK, are gen- 
erated using as gees f= (fi fo. fs) as a Groth-Sahai CRS (in such a way 


>TP 


that Cp, = p -fi - fo’, for some rp,,sp, & Zp, for example). 

ShareEvalVerify(pk, vk;,®, K;,7,): parse vk; as (Y; 1, Yi2) € G? and return L 
if (K;,7x,) cannot be parsed as a tuple in G x G!°. Otherwise, parse 7x, 
as TK, = = (Cp,,Cr,, Op, tx, ) € G and return 1 if 7%, is a valid proof for 
equations (2). In any other situation, return 0. 


Combine(pk, 8, {(Ki,7K,)}ieg): return L if there is an index i € S for which 
ShareEvalVerify(pk, vki, ®, Ki, nK,) = 0. Otherwise, compute 


K=[[ k =07 .o? a2 eK. 
ics 
Theorem 3. The above construction is an all-but-one perfectly sound threshold 


hash proof system assuming that the DLIN assumption holds in G. (The proof 
is given in the full version of the paper.) 


The proof myn takes 6 group elements whereas Čo, Con require 3 group ele- 
ments each. If the scheme is instantiated using Groth’s one-time signature [29] 
(which relies on the discrete logarithm assumption), SVK and o demand 3 and 2 
group elements, respectively. The whole ciphertext C thus consists of 21 group 
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elements. Concretely, if each element has a representation of 512 bits, at the 
128-bit security level, the ciphertext overhead amounts to 10240 bits. 

From a computational standpoint, assuming that a multi-exponentiation with 
two base elements has roughly the same cost as a single-base exponentiation, the 
sender has to compute 19 exponentiations in G (we include the cost of generating 
SVK which incurs three exponentiations in Groth’s one-time signature [29]). As 
for the verifier’s workload, the validity of a ciphertext can be checked by com- 
puting a product of 12 pairings (which is more efficient than naively evaluating 
12 individual pairings) using batch verification techniques as in [6]. 

In the full version of the paper, we show an even more efficient instantiation 
based on the Symmetric eXternal Diffie-Hellman assumption in prime order 
groups: only 6 pairing evaluations suffice to check zy. 


Acknowledgements. We thank the anonymous reviewers and Carla Rafols for 
useful comments. 
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Abstract. In 2008, Groth and Sahai proposed a powerful suite of tech- 
niques for constructing non-interactive zero-knowledge proofs in bilinear 
groups. Their proof systems have found numerous applications, includ- 
ing group signature schemes, anonymous voting, and anonymous creden- 
tials. In this paper, we demonstrate that the notion of smooth projective 
hash functions can be useful to design round-optimal privacy-preserving 
interactive protocols. We show that this approach is suitable for design- 
ing schemes that rely on standard security assumptions in the standard 
model with a common-reference string and are more efficient than those 
obtained using the Groth-Sahai methodology. As an illustration of our 
design principle, we construct an efficient oblivious signature-based en- 
velope scheme and a blind signature scheme, both round-optimal. 


1 Introduction 


In 2008, Groth and Sahai proposed a way to produce efficient and practi- 
cal non-interactive zero-knowledge and non-interactive witness-indistinguishable 
proofs for (algebraic) statements related to groups equipped with a bilinear map. 
They have been significantly studied in cryptography and used in a wide variety 
of applications in recent years (e.g. group signature schemes [8}(9][20] or blind 
signatures [2][5]). While avoiding expensive NP-reductions, these proof systems 
still lack in practicality and it is desirable to provide more efficient tools. 

Smooth projective hash functions (SPHF) were introduced by Cramer and 
Shoup [I3] for constructing encryption schemes. A projective hashing family is 
a family of hash functions that can be evaluated in two ways: using the (se- 
cret) hashing key, one can compute the function on every point in its domain, 
whereas using the (public) projected key one can only compute the function on 
a special subset of its domain. Such a family is deemed smooth if the value of 
the hash function on any point outside the special subset is independent of the 
projected key. If it is hard to distinguish elements of the special subset from non- 
elements, then this primitive can be seen as special type of zero-knowledge proof 
system for membership in the special subset. The notion of SPHF has found 
applications in various contexts in cryptography (e.g. [18)|26)[I]). We present 
some other applications with privacy-preserving primitives that were already 
inherently interactive. 
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Applications: Our two applications are Oblivious Signature-Based Envelope [27] 
and Blind Signatures [I2]. 

Oblivious Signature-Based Envelope (OSBE) were introduced in [27]. It can 
be viewed as a nice way to ease the asymmetrical aspect of several authentica- 
tion protocols. Alice is a member of an organization and possesses a certificate 
produced by an authority attesting she is in this organization. Bob wants to 
send a private message P to members of this organization. However due to the 
sensitive nature of the organization, Alice does not want to give Bob neither her 
certificate nor a proof she belongs to the organization. OSBE lets Bob sends an 
obfuscated version of this message P to Alice, in such a way that Alice will be 
able to find P if and only if Alice is in the required organization. In the pro- 
cess, Bob cannot decide whether Alice does really belong to the organization. 
They are part of a growing field of protocols, around automated trust negotia- 
tion, which also include Secret Handshakes [3], Password-based Authenticated 
Key-Exchange [19], and Hidden Credentials [10]. Those schemes are all closely 
related, so due to space constraints, we are going to focus on OSBE (as if you 
tweak two of them, you can produce any of the other protocols [Ij]). 

Blind signatures were introduced by Chaum [12] for electronic cash in order 
to prevent the bank from linking a coin to its spender: they allow a user to 
obtain a signature on a message such that the signer cannot relate the resulting 
message/signature pair to the execution of the signing protocol. In [15], Fischlin 
gave a generic construction of round-optimal blind signatures in the common- 
reference string (CRS) model: the signing protocol consists of one message from 
the user to the signer and one response by the signer. The first practical instan- 
tiation of round-optimal blind signatures in the standard model was proposed 
in [2] but it relies on non-standard computational assumptions. We proposed, re- 
cently only [5], the most efficient realizations of round-optimal blind signatures 
in the common-reference string model under classical assumptions. But these 
schemes still use the Groth-Sahai proof systems. 


Contributions: Our first contribution is to clarify and increase the security 
requirements of an OSBE scheme. The main improvement residing in some pro- 
tection for both the sender and the receiver against the Certification Authority. 
The OSBE notion echoes directly to the idea of SPHF if we consider the language 
L defined by encryption of valid signatures, which is hard to distinguish under 
the security of the encryption schemes. We show how to build, from a SPHF 
on this language, an OSBE scheme in the standard model with a CRS. And we 
prove the security of our construction in regards of the security of the commit- 
ment (the ciphertext), the signature and the SPHF scheme. We then show how 
to build a simple and efficient OSBE scheme relying on a classical assumption, 
DLin. An asymmetrical version is available in the full version [6]. To build those 
schemes, we use SPHF in a new way, avoiding the need of costly Groth-Sahai 
proofs when an interaction is inherently needed in the primitive. Our method 
does not add any other interaction, and so supplement smoothly those proofs. 
To show the efficiency of the method, and the ease of application, we then 
adapt two Blind Signature schemes proposed in [5]. Our approach fits perfectly 
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and decreases significantly the communicational complexity of the schemes (it 
is divided by more than three in one construction). Moreover one scheme relies 
on a weakened security assumptions: the XDH assumption instead of the SXDH 
assumption and permits to use more bilinear group settings (namely, Type-II 
and Type-III bilinear groups [16] instead of only Type-III bilinear groups for the 
construction presented in [5]). 


2 Definitions 


In this section, we briefly recall the notations and the security notions of the 
basic primitives we will use in the rest of the paper, and namely public key 
encryption, signature and smooth projective hash functions (SPHF), using the 
Gennaro-Lindell [18] extension. More details are available in the full version [6]. 
In a second part, we recall and enhance the security model of oblivious signature- 
based envelope protocols [27]. 


2.1 Notations 


Encryption Scheme. A (public-key) encryption scheme is defined by four algo- 
rithms: param + ESetup(1*), (ek, dk) + EKeyGen(param), c + Encrypt(ek, m; r), 
and m + Decrypt(dk, c). We will need the classical notion of IND-CPA security. 
More precisely, we will use commitment schemes (as in [I]), which should be 
hiding (indistinguishability) and binding (one opening only), with the additional 
extractability property. The latter property thus needs an extracting algorithm 
that corresponds to the decryption algorithm. Hence the notation with encryp- 
tion schemes. 


Signature Scheme. A signature scheme is defined by four algorithms: param + 
SSetup(1*), (vk, sk) + SKeyGen(param), o + Sign(sk, m; s), and Verif(vk, m, o). 
We will need the classical notion of EUF-CMA security. 


Smooth Projective Hash Function. An SPHF system on a language £ is 
defined by five algorithms: SPHFSetup(1*) that generates the global parameters, 
HashKG(Z, param) that generates a hashing key hk, ProjKG(hk, (£, param), W) 
that derives the projection key hp, possibly depending on the word W [ISIE]. 
Then, Hash(hk, (£, param), W) and ProjHash(hp, (£, param), W, w) outputs the 
hash value, either from the hashing key, or from the projection key and the 
witness. The correctness of the scheme assures that if W is indeed in £ with w as 
a witness, then the two ways to compute the hash value give the same result. The 
security of a SPHF is defined through two different notions, the smoothness and 
the pseudo-randomness properties: The smoothness property guarantees that if 
W ¢ L, then the hash value is statistically random (statistically indistinguishable 
from a random element). The pseudo-randomness guarantees that even for a 
word W € £, but without the knowledge of a witness w, then the hash value is 
random (computationally indistinguishable from a random element). Abdalla et 
al. [I] explained how to combine SPHF to deal with conjunctions and disjunctions 
of the languages. 
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2.2  Oblivious Signature-Based Envelope 


We now define an OSBE protocol, where a sender S wants to send a private 
message P € {0,1} to a recipient R in possession of a certificate/signature on 
a message M. 


Definition 1 (Oblivious Signature-Based Envelope). An OSBE scheme 
is defined by four algorithms (OSBESetup, OSBEKeyGen, OSBESign, OSBEVerif), 
and one interactive protocol OSBEProtocol(S, RY: 


— OSBESetup(1"), where k is the security parameter, generates the global pa- 
rameters param; 

— OSBEKeyGen(param) generates the keys (vk, sk) of the certification authority; 

— OSBESign(sk,m) produces a signature o on the input message m, under the 
signing key sk; 

— OSBEVerif(vk,m,o) checks whether o is a valid signature on m, w.r.t. the 
public key vk; it outputs 1 if the signature is valid, and 0 otherwise. 

— OSBEProtocol(S(vk, M, P),R(vk, M,o)) between the sender S with the pri- 
vate message P, and the recipient R with a certificate o. If o is a valid 
signature under vk on the common message M, then R receives P, other- 
wise it receives nothing. In any case, S does not learn anything. 


Such an OSBE scheme should be (the three last properties are additional —or 
stronger— security properties from the original definitions [27]): 


— correct: the protocol actually allows R to learn P, whenever o is a valid 
signature on M under vk; 

— oblivious: the sender should not be able to distinguish whether R uses a valid 
signature g on M under vk as input. More precisely, if Ro knows and uses 
a valid signature g and R, does not use such a valid signature, the sender 
cannot distinguish an interaction with Ro from an interaction with R1; 

— (weakly) semantically secure: the recipient learns nothing about S input P if 
it does not use a valid signature o on M under vk as input. More precisely, if 
So owns P and Sı owns P}, the recipient that does not use a valid signature 
cannot distinguish an interaction with Sp from an interaction with Sj; 

— semantically secure (denoted sem): the above indistinguishability should hold 
even if the receiver has seen several interactions (S(vk, M, P), R(vk, M,c)) 
with valid signatures, and the same sender’s input P; 

— escrow free (denoted esc): the authority (owner of the signing key sk), playing 
as the sender or just eavesdropping, is unable to distinguish whether R used 
a valid signature o on M under vk as input. This notion supersedes the 
above oblivious property, since this is basically oblivious w.r.t. the authority, 
without any restriction. 

— semantically secure w.r.t. the authority (denoted sem*): after the interaction, 
the authority (owner of the signing key sk) learns nothing about P. 


We insist that the escrow-free property (esc) is stronger than the oblivious prop- 
erty, hence we will consider the former only. However, the semantic security 
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Exp sge, a(k) [Escrow Free property] 


1. param + OSBESetup(1*) 

2. vk + A(INIT : param) 

3. (M,o) + A(FIND : Send(vk, -,-), Rec* (vk, -, -,0), Exec* (vk, -,-,-)) 
4. OSBEProtocol(A, Rec* (vk, M, a, b)) 

5. b’ < A(GUESS : Send(vk, -, -), Rec*(vk, -,-,0), Exec” (vk, -,-,-)) 

6. RETURN b’ 


sem* —b 


Expo sge alk) [Semantic security w.r.t. the authority] 


1. param + OSBESetup(1*) 

2. vk + A(INIT : param) 

3. (M, o, Po, Pi) + A(FIND : Send(vk, -, -), Rec* (vk, -,-, 0), Exec* (vk, -,-,-)) 
4. transcript + OSBEProtocol(Send(vk, M, P»), Rec* (vk, M, a, 0) 

5. b’ + A(GUESS : transcript, Send(vk, -, -), Rec* (vk, -,-,0), Exec” (vk, -,-,-)) 
6. RETURN b’ 


sem—b 


Exp6spe,a(k) [Semantic Security] 

. param + OSBESetup(1*) 

(vk, sk) + OSBEKeyGen(param) 

. (M, Po, Pa) + A(FIND : vk, Sign* (vk, -), Send(vk, -, -), Rec(vk, -, 0), Exec(vk, -, -)) 
. OSBEProtocol(Send(vk, M, P»), A) 

. b' + A(GUESS : Sign(vk, -), Send(vk, -, -), Rec(vk, -, 0), Exec(vk, -, -)) 

. IF M € SM RETURN 0 ELSE RETURN 0’ 


Onapfwne 


Fig. 1. Security Games for OSBE 


w.r.t. the authority (sem*) is independent from the basic semantic security (sem) 
since in the latter the adversary interacts with the sender whereas in the for- 
mer the adversary (who generated the signing keys) has only passive access to 
a challenge transcript. 

These security notions can be formalized by the security games presented on 
Figure[I] where the adversary keeps some internal state between the various calls 
INIT, FIND and GUESS. They make use of the oracles described below, and the 
advantages of the adversary are, for all the security notions, 


Advospe,a(k) = Pr[Expospea(’) = 1] — PrlExpospea(’) = 1 


Advospe(k; t) = max Advõsge,a(k). 


— Sign(vk, m): This oracle outputs a valid signature on m under the signing 
key sk associated to vk (where the pair (vk,sk) has been outputted by the 
OSBEKeyGen algorithm); 

— Sign* (vk, m): This oracle first queries Sign(vk, m). It additionally stores the 
query m to the list SM; 

— Send(vk, m, P): This oracle emulates the sender with private input P, and 
thus may consist of multiple interactions; 

— Rec(vk, m, b): This oracle emulates the recipient either with a valid signature 
o on m under the verification key vk (obtained from the signing oracle Sign) 
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if b = 0 (as the above Ro), or with a random string if b = 1 (as the above 
R1). This oracle is available when the signing key has been generated by 
OSBEKeyGen only; 

— Rec*(vk, m, g, b): This oracle does as above, with a valid signature o provided 
by the adversary. If b = 0, it emulates the recipient playing with ø; if b= 1, 
it emulates the recipient playing with a random string; 

— Exec(vk,m, P): This oracle outputs the transcript of an honest execution 
between a sender with private input P and the recipient with a valid signa- 
ture o on m under the verification key vk (obtained from the signing oracle 
Sign). It basically activates the Send(vk,m, P) and Rec(vk, m, 0) oracles. 

— Exec* (vk, m, ø, P): This oracle outputs the transcript of an honest execution 
between a sender with private input P and the recipient with a valid signa- 
ture o (provided by the adversary). It basically activates the Send(vk, m, P) 
and Rec* (vk, m, ø, 0) oracles. 


Remark 2. The OSBE schemes proposed in do not satisfy the semantic 
security w.r.t. the authority. This is obvious for the generic construction based 
on identity-based encryption which consists in only one flow of communication 
(since a scheme that achieves the strong security notions requires at least two 
flows). This is also true (to a lesser extent) for the RSA-based construction: for 
any third party, the semantic security relies (in the random oracle model) on the 
CDH assumption in a 2048-bit RSA group; but for the authority, it can be broken 
by solving two 1024-bit discrete logarithm problems. This task is much simpler 
in particular if the authority generates the RSA modulus N = pq dishonestly 
(e.g. with p — 1 and q — 1 smooth). In order to make the scheme secure in our 
strong model, one needs (at least) to double the size of the RSA modulus and to 
make sure that the authority has selected and correctly employed a truly random 
seed in the generation of the RSA key pair [25]. 


3 An Efficient OSBE Scheme 


In this section, we present a high-level instantiation of OSBE with the previous 
primitives as black boxes. Thereafter, we provide a specific instantiation with 
linear ciphertexts. The overall security then relies on the DLin assumption, a 
quite standard assumption in the standard model. Its efficiency is of the same 
order of magnitude than the construction based on identity-based encryption [27] 
(that only achieves weaker security notions) and better than the RSA-based 
scheme which provides similar security guarantees (in the random oracle model). 


3.1 High-Level Instantiation 


We assume we have an encryption scheme €, a signature scheme S and a SPHF 
system onto a set G. We additionally use a key derivation function KDF to 
derive a pseudo-random bit-string K € {0,1}* from a pseudo-random element v 
in G. One can use the Leftover-Hash Lemma [23], with a random seed defined 
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in param during the global setup, to extract the entropy from v, then followed 
by a pseudo-random generator to get a long enough bit-string. Many uses of 
the same seed in the Leftover-Hash-Lemma just leads to a security loss linear 
in the number of extractions. We describe an oblivious signature-based envelope 
system OSBE, to send a private message P € {0,1}: 


— OSBESetup(1"), where k is the security parameter: 

e it first generates the global parameters for the signature scheme (using 
SSetup), the encryption scheme (using ESetup), and the SPHF system 
(using SPHFSetup); 

e it then generates the public key ek of the encryption scheme (using 
EKeyGen, while the decryption key will not be used); 

The output param consists of all the individual param and the encryption 
key ek; 

— OSBEKeyGen(param) runs SKeyGen(param) to generate a pair (vk,sk) of 
verification-signing keys; 

— The OSBESign and OSBEVerif algorithms are exactly Sign and Verif from 
the signature scheme; 

— OSBEProtocol(S(vk, M, P), R(vk, M,c)): In the following, £ = L(vk, M) will 
describe the language of the ciphertexts under the above encryption key ek 
of a valid signature of the input message M under the input verification key 
vk (hence vk and M as inputs, while param contains ek). 

e R generates and sends c = Encrypt(ek, o; r); 

e S computes hk = HashKG(ZL, param), hp = ProjKG(hk, (£, param), c), v = 
Hash(hk, (£, param), c), and Q = P ® KDF(v); S sends hp, Q to R; 

e R computes v’ = ProjHash(hp, (£, param),c,r) and P’ = Q @ KDF(v’). 


3.2 Security Properties 
Theorem 3 (Correct). OSBE is sound. 


Proof. Under the correctness of the SPHF system, v’ = v, and thus P’ = (P $ 
KDF(v)) @ KDF(v’) = P. 


Theorem 4 (Escrow-Free). OSBE is escrow-free if the encryption scheme 
E is semantically secure: AdvSSpe(k,t) < Adv? (k, t) with t = t. 


Proof. Let us assume A is an adversary against the escrow-free property of our 
scheme: The malicious adversary A is able to tell the difference between an 
interaction with Ro (who knows and uses a valid signature) and Ri (who does 
not use a valid signature), with advantage €. 

We now build an adversary 6 against the semantic security of the encryption 
scheme £E: 


— B is first given the parameters for € and an encryption key ek; 

— B emulates OSBESetup: it runs SSetup and SPHFSetup by itself. For the 
encryption scheme €, the parameters and the key have already been provided 
by the challenger of the encryption security game; 
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— A provides the verification key vk; 
— B has to simulate all the oracles: 
e Send(vk, M, P), for a message M and a private input P: upon receiving 
c, one computes hk = HashKG(Z, param), hp = ProjKG(hk, (£, param), c), 
v = Hash(hk, (£, param),c), and Q = P @® KDF(v). One sends back 
(hp, Q); 
e Rec*(vk, M,o,0), for a message M and a valid signature o: B outputs 
c = Encrypt(ek, o; r); 
e Exec*(vk, M,o, P): one first runs Rec(vk, M,o,0) to generate c, that is 
provided to Send(vk, M, P), to generate (hp, Q). 
— At some point, A outputs a message M and a valid signature o, and B has 
to simulate Rec* (vk, M,o,b): B sets oo + o and sets cı as a random string. 
It sends (00,01) to the challenger of the semantic security of the encryption 
scheme and gets back c, an encryption of og, for a random unknown bit £. 
It outputs c; 
— B provides again access to the above oracles, and A outputs a bit b’, that B 
forwards as its guess 8’ for the 8 involved in the semantic security game for 


E: 


Note that the above simulation perfectly emulates Eposu, a(k) (since basically 
b is 8, and b’ is p’): 


e = AdvSSpe_a(k) = Advis (k) < Advi" (k, t). 


Theorem 5 (Semantically Secure). OSBE is semantically secure if the 
signature is unforgeable, the SPHF is smooth and the encryption scheme is 
semantically secure (and under the pseudo-randomness of the KDF): 


Advisitne (k, t) < qu Adv? (k, t') +2 Succ% (k, qs, t") +2 AVES (k)witht, t" xt. 


In the above formula, qu denotes the number of interactions the adversary has 
with the sender, and qs the number of signing queries the adversary asked. 


Proof. Let us assume A is an adversary against the semantic security of our 
scheme: The malicious adversary A is able to tell the difference between an 
interaction with So (who owns Po) and S; (who owns Pı), with advantage e. We 
start from this initial security game, and make slight modifications to bound e. 


Game Go. Let us emulate this security game: 


— B emulates the initialization of the system: it runs OSBESetup by itself, and 
then OSBEKeyGen to generate (vk, sk); 
— B has to simulate all the oracles: 
e Sign(vk, M) and Sign* (vk, M): it runs the corresponding algorithm by 
itself; 
e Send(vk, M, P), for a message M and a private input P: upon receiving 
c, one computes hk = HashKG(£, param), hp = ProjKG(hk, (£, param), c), 
v = Hash(hk, (£, param),c), and Q = P @ KDF(v). One sends back 
(hp, Q); 
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e Rec(vk, M,0), for a message M: B asks for a valid signature o on M, 
computes and outputs c = Encrypt(ek, o; r); 

e Exec(vk, M, P): one simply first runs Rec(vk, M,0) to generate c, that is 
provided to Send(vk, M, P), to generate (hp, Q). 

— At some point, A outputs a message M and two inputs (Po, P;) to distinguish 
the sender, and B call back the above Send(vk, M, P,) simulation to interact 
with A; 

— B provides again access to the above oracles, and A outputs a bit b. 


In this game, A has an advantage £ in guessing b: 


e = Pr[b! = 1b = 1] — Pr[b' = 1b = 0] = 2 x Pr[b' = b] — 1. 
0 Go Go 


Game gË? . This game involves the semantic security of the encryption scheme: B 
is already provided the parameters and the encryption key ek by the challenger 
of the semantic security of the encryption scheme, hence the initialization is 
slightly modified. In addition, 6 randomly chooses the bit b, and modifies the 
Rec oracle simulation: 


— Rec(vk, M, 0), for a message M: B asks for a valid signature oo on M, and 
sets cı as a random string, computes and outputs c = Encrypt(ek, op; r). 


Since B knows b, it finally outputs 8’ = (b = b). 

Note that G? is exactly Go, and the distance between G? and G1 relies on the 
Left-or-Right security of the encryption scheme, which can be shown equivalent 
to the semantic security, with a lost linear in the number of encryption queries, 
which is actually the number qu of interactions with a user (the sender in this 
case), due to the hybrid argument [4]: 


qu x Advt? (k) > Pr[6’ = 1|8 = 0] — Pr[8' = 1|8 = 1] 
= Pr[b' = b| 8 = 0] — Pr[b' = b| 8 = 1] 


= (2 x Pr[b’ = b] — 1) — (2 x Pr[b' = b] — 1) 
go gi 


As a consequence: £ < qy x Advif4(k) + (2 x Prgi[b' = b] — 1). 


Game G2. This game involves the unforgeability of the signature scheme: B is 
already provided the parameters and the verification vk for the signature scheme, 
together with access to the signing oracle (note that all the signing queries Sign* 
asked by the adversary in the FIND stage, i.e., before the challenge interaction 
with Send(vk, M, P,), are stored in SM). The simulator B generates itself all the 
other parameters and keys, an namely the encryption key ek, together with the 
associated decryption key dk. For the Rec oracle simulation, 6 keeps the random 
version (as in G1). In the challenge interaction with Send(vk, M, Pp), one stops 
the simulation and makes the adversary win if it uses a valid signature on a 
message M ¢ SM: 
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— Send(vk, M, P,), during the challenge interaction: upon receiving c, if M ¢ 
SM, it first decrypts c to get the input signature ø. If ø is a valid signature, 
one stops the game, sets b’ = b and outputs 0’. If the signature is in not 
valid, the simulation remains unchanged; 

— Rec(vk, M,0), for a message M: B sets o as a random string, computes and 
outputs c = Encrypt(ek, o; r). 


Because of the abort in the case of a valid signature on a new message, we know 
that the adversary cannot use such a valid signature in the challenge. So, since M 
should not be in SM, the signature will be invalid. Actually, the unique difference 
from the previous game G} is the abort in case of valid signature on a new 
message in the challenge phase, which probability is bounded by Succ" (k, qs). 
Using Shoup’s Lemma [29]: 

Rpt) Tp us Succ$" (k, qs). 


As a consequence: € < qu x Advi?! (k) +2 x SuccẸ (k, qs) + (2 x Prg, [b = b] — 1). 


Game Gs. The last game involves the smoothness of the SPHF: The unique 
difference is in the computation of v in Send simulation, in the challenge phase 
only: B chooses a random v € G. Due to the statistical randomness of v in 
the previous game, in case the signature is not valid (a word that is not in the 
language), this game is statistically indistinguishable from the previous one: 


Pr[b’ = b] — Pr[b' = b] < Advangeott.(k). 
G2 G3 


Since P, is now masked by a truly random value, no information leaks on b: 
Prg, [b = b] = 1/2. 


Theorem 6. OSBE is semantically secure w.r.t. the authority if the SPHF 
is pseudo-random (and under the pseudo-randomness of the KDF): 


Adve (k, t) < 2 x Advepayr(k, t). 


Proof. Let us assume A is an adversary against the semantic security w.r.t. the 
authority: The malicious adversary A is able to tell the difference between an 
eavesdropped interaction with So (who owns Pp) and Sı (who owns P;), with ad- 
vantage £. We start from this initial security game, and make slight modifications 
to bound e. 


Game Go. Let us emulate this security game: 


— B emulates the initialization of the system: it runs OSBESetup by itself; 
— A provides the verification key vk; 
— B has to simulate all the oracles: 
e Send(vk, M, P), for a message M and a private input P: upon receiving 
c, one computes hk = HashKG(£, param), hp = ProjKG(hk, (£, param), c), 
v = Hash(hk, (£, param),c), and Q = P @® KDF(v). One sends back 


(hp, Q); 
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e Rec*(vk, M,o,0), for a message M and a valid signature o: B outputs 
c = Encrypt(ek, o; r); 

e Exec*(vk, M,o, P): one first runs Rec(vk, M,o,0) to generate c, that is 
provided to Send(vk, M, P), to generate (hp, Q). 

— At some point, A outputs a message M with a valid signature o, and 
two inputs (Po, P1) to distinguish the sender, and 6 call back the above 
Send(vk, M, P,) and Rec*(vk, M,o,0) simulations to interact together and 
output the transcript (c; hp, Q); 

— B provides again access to the above oracles, and A outputs a bit b’. 


In this game, A has an advantage £ in guessing b: 


E= Pr[b! = 1|b = 1] — Pr[b’ = 1|b = 0] =2x Pr[b' = b] —1. 
Go Go Go 


Game G,. This game involves the pseudo-randomness of the SPHF: The unique 
difference is in the computation of v in Send simulation of the eavesdropped 
interaction, and so for the transcript: B chooses a random v € G and computes 
Q = P, ® KDF(v). Due to the pseudo-randomness of v in the previous game, 
since A does not know the random coins r used to encrypt ø, this game is 
computationally indistinguishable from the previous one. 


Pr{b’ = b] — Pr[b’ =b] < Advena x(k, t). 
Gi Go 


Since P, is now masked by a truly random value v, no information leaks on b: 
Prg [b = b] = 1/2. 


3.3 Our Efficient OSBE Instantiation 


Our first construction combines the linear encryption scheme [7], the Waters 
signature scheme [30] and a SPHF on linear ciphertexts [[3]28]. It thus relies on 
classical assumptions: CDH for the unforgeability of signatures and DLin for the 
semantic security of the encryption scheme. The formal definitions are recalled 
in the full version [6]. 


Basic Primitives. Given an encrypted Waters signature from the recipient, 
the sender is able to compute a projection key, and a hash corresponding to the 
expected signature, and send to the recipient the projection key and the product 
between the expected hash and the message P. If the recipient was honest (a 
correct ciphertext), it is able to compute the hash thanks to the projection key, 
and so to find P, in the other case it does not learn anything. 

We briefly sketch the basic building blocks: linear encryption, Waters signa- 
ture and the SPHF for linear tuples. 

All these primitives work in a pairing-friendly environment (p,G,g,Gr,e), 
where e: G x G > Gr is an admissible bilinear map, for two groups G and Gr, 
of prime order p, generated by g and ge = e(g, g) respectively. 
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Waters Signatures. The public parameters are a generator h © G and a vector 
u = (uo,-..,ug) & G*t!, which defines the Waters hash of a message M = 
(Mi, ..., Mx) € {0,1}* as F(M) = uo [[}; uM. The public verification key is 
vk = g7, which corresponding secret signing key is sk = h7, for a random z È Zp- 
The signature on a message M € {0,1}* is o = (o1 = sk- F(M)*, 02 = g°), 
for some random s © Zp. It can be verified by checking e(g,o1) = e(vk, h) - 
e(F(M),o2). This signature scheme is unforgeable under the CDH assumption. 


Linear Encryption. The secret key dk is a pair of random scalars (y1, y2) and 
the public key is ek = (Yı = g¥,Y2 = g¥2). One encrypts a message M € G 
asc = (ca = Y{,c2 = Y3?,c3 = g™!*"? - M), for random scalars r1,r2 na 
(a ch/¥) 


Zp. To decrypt, one computes M = c3/ . This encryption scheme is 


semantically secure under the DLin assumption. 


DLin-compatible Smooth-Projective Hash Function. This is actually a weaker 
variant of [28]. The language £ consists of the linear tuples w.r.t. a basis (u, v, g). 
For a linear encryption key ek = (Y1, Y2), a ciphertext C = (c1,c2,c3) is an 
encryption of the message M if (c1,c2,c3/M) is a linear tuple w.r.t. the basis 
(Y1, Ya, g). The language Lin(ek, M) consists of these ciphertexts. An SPHF for 
this language can be: 
HashKG(Lin(ek, M)) = hk = (21, £2, %3) È Z3 
Hash(hk; Lin(ek, M), C) = ci c3’ (c3/ M)” 
ProjKG(hk; Lin(ek, M), C) = hp = (YF g, Y3° g™°) 
ProjHash(hp; Lin(ek, M), C; r) = hpi" hp3? 

This function is defined for linear tuples in G, but it could work in any group, 
since it does not make use of pairings. And namely, we use it below in Gr. 


Smooth-Projective Hash Function for Linear Encryption of Valid Waters Signa- 
tures. We will consider a slightly more complex language: the ciphertexts under 
ek of a valid signature of M under vk. A given ciphertext C = (c1, C2, €3,02) 
contains a valid signature of M if and only if (c1, c2,c3) actually encrypts cı 
such that (01,02) is a valid Waters signature on M. The latter means 


(C1 = e(c1, g), C2 = e(c2, 9), C3 = e(c3, g)/(e(h, vk) - e(F(M), 02)) 

is a linear tuple in basis (U = e(Yi,g),V = e(Y2,9), gt = e(g,g)) in Gr. Since 
the basis consists of 3 elements of the form e(-,g), the projected key can be 
compacted in G. We thus consider the language WLin(ek, vk, M) that contains 

these quadruples (c1, c2, €3,02), and its SPHF: 
HashKG(WLin(ek, vk, M)) = hk = (21, £2, £3) Č Ze 

Hash(hk; WLin(ek, vk, M), C) = 

e(cr,9)e(c2, 9)®? (elca, 9)/ (elh, vk)e(F(M), 02)))™ 

ProjKG(hk; WLin(ek, vk, M), C) = hp = (ek7'g"*, ek3°g”°) 

ProjHash(hp; WLin(ek, vk, M), C; r) = e(hpi"hp3?, g) 
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Instantiation. We now define our OSBE protocol, where a sender S wants to 
send a private message P € {0,1}* to a recipient R in possession of a Waters 
signature on a message M. 


— OSBESetup(1*), where k is the security parameter, defines a pairing-friendly 
environment (p,G,g,Gr,e), the public parameters h € G, an encryption 
key ek = (Yı = g", Y2 = g”?), where (yi, y2) © Z2, and u = (uo,..., ug) © 
G**1 for the Waters signature. All these elements constitute the string 
param; 

— OSBEKeyGen(param), the authority generates a pair of keys (vk = g*,sk = 
hē) for a random scalar z € Zp; 

— OSBESign(sk, M) produces a signature o = (h*F(M)*, g*); 

— OSBEVerif (vk, M, o) checks if e(o1, g) = e(a2, F(M)) - e(h, vk). 

— OSBEProtocol(S(vk, M, P),R(vk, M,o)) runs as follows: 


e R chooses random 11,72 È Zp and sends a linear encryption of ø: 
C = (c =e c2 = ek3?, c3 = gt + 01,02) 

e S chooses random 2, £2, £3 © Z3 and computes: 

x HashKG(WLin(ek, vk, M)) = hk = (x1, £2, £3); 

x Hash(hk; WLin(ek, vk, M), C) = v = 

e(c1, g)"*e(c2, g)"* (e(c3, g)/ (elh, vk)e(F(M), o2)))"*; 

x ProjKG(hk; WLin(ek, vk, M), C) = hp = (ek7'g78, ek3°g7°). 
e S then sends (hp, Q = P ® KDF(v)) to R; 
e R computes v’ = e(hpi'hps’,g) and P’ = Q @ KDF(v’). 


An asymmetric instantiation can be found in the full version [6]. 


3.4 Security and Efficiency 


We now provide a security analysis of this scheme. This instantiation differs, from 
the high-level instantiation presented before, in the ciphertext C of the signature 
o = (01,02). The second half of the signature indeed remains in clear. It thus 
does not guarantee the semantic security on the signature used in the cipher- 
text. However, granted Waters signature randomizability, one can re-randomize 
the signature each time, and thus provide a totally new a2: it does not leak 
any information about the original signature. The first part of the ciphertext 
(c1, C2,¢3) does not leak any additional information under the DLin assumption. 
As a consequence, the global ciphertext guarantees the semantic security of the 
original signature if a new re-randomized signature is encrypted each time. We 
can now apply the high-level construction security, and all the assumptions hold 
under the DLin one: 


Theorem 7. Our OSBE scheme is secure (i.e., escrow-free, semantically se- 
cure, and semantically secure w.r.t. the authority) under the DLin assumption 
(and the pseudo-random generator in the KDF). 
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Our proposed scheme needs one communication for R and one for S, so it is 
round-optimal. Communication also consists of few elements, R sends 4 group 
elements, and S answers with 2 group elements only and an ¢-bit string for the 
masked P € {0,1}. As explained in Remark Ø] this has to be compared with 
the RSA-based scheme from [27] which requires 2 elements in RSA groups (with 
double-length modulus). For a 128-bit security level, using standard Type-I bilin- 
ear groups implementation [16], we obtain a 62.5% improvement] in communica- 
tion complexity over the RSA-based scheme proposed in the original paper : 

While reducing the communication cost of the scheme, we have improved 
its security and it now fits the proposed applications. In [27], such schemes 
were proposed for applications where someone wants to transmit a confidential 
information to an agent belonging to a specific agency. However the agent does 
not want to give away his signature. As they do not consider eavesdropping and 
replay in their semantic security nothing prevents an adversary to replay a part 
of a previous interaction to impersonate a CIA agent (to recall their example). 
In practice, an additional secure communication channel, such as with SSL, was 
required in their security model, hence increasing the communication cost: our 
protocol is secure by itself. 


4 An Efficient Blind Signature 


4.1 Definitions 


A more formal definition of blind signatures is provided in the full version [6], 
but we briefly recall it in this section: A blind signature scheme BS is defined 
by a setup algorithm BSSetup(1") that generates the global parameters param, 
and key generation algorithm BSKeyGen(param) that outputs a pair (vk, sk), and 
interactive protocol BSProtocol(S(sk), (vk, m)) which provides U with a signa- 
ture on m, and a verification algorithm BSVerif (vk, m, o) that checks its validity. 
The security of a blind signature scheme is defined through the unforgeability 
and blindness properties: An adversary against the unforgeability tries to gen- 
erate qs + 1 valid message-signature pairs after at most qs complete interactions 
with the honest signer; The blindness condition states that a malicious signer 
should be unable to decide which of two messages mo, mı has been signed first 
in two executions with an honest user. 


4.2 Our Instantiation 


We now present a new way to obtain a blind signature scheme in the standard 
model under classical assumptions with a common-reference string. This is an 
improvement over [5]. We are going to use the same building blocks as before, 
so linear encryption, Waters signatures and a SPHF on linear ciphertexts. More 
elaborated languages will be required, but just conjunctions and disjunctions of 


1 The improvement is even more important for the scheme described in the full version 
where the size drops down to 3/16-th. 
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classical languages, as done in [I] (see the full version [6]), hence the efficient 
construction. Our blind signature scheme is defined by: 


— BSSetup(1*), generates a pairing-friendly system (p, G, g, Gr, e) and an en- 
cryption key ek = (u,v,g) € G. It also chooses at random h € G and 
generators u = (ui)iefi,q € G! for the Waters function. It outputs the 
global parameters param = (p, G, g, Gr, e, ek, h, u); 

— BSKeyGen(param) picks at random a secret key sk = x and computes the 
verification key vk = g7; 

— BSProtocol(S(sk),2/(vk, m)) runs as follows, where U wants to get a signature 
on M 

e U computes the bit-per-bit encryption of M by encrypting each uM in bj, 
Vi € [1,4], bi = Encrypt(ek, ul; (ria, rig)) = (uw, v2, girt iyw), 
Then writing rı = Sori, and r2 = ` r;2, he computes the encryption 
c of vk"!*"? with Encrypt(ek, vk"!*"?; (s1, 82)) = (ut, v2, gt t82vk" 772), 
U then sends (c, (b;)); 

e On input of these ciphertexts, the algorithm S computes the correspond- 
ing SPHF, considering the language £ of valid ciphertexts. This is the 
conjunction of several languages : 

1. One checking that each b; encrypts a bit in basis u;: in BLin(ek, u;); 

2. One considering (d1, d2,c1, C2, C3), that checks if (c1, c2,c3) encrypts 
an element d3 such that (dı, d2, d3) is a linear tuple in basis (u, v, vk): 
in ELin(ek, vk), where dı = [], bi,1 and dz = ] |; bi,2. 

e S computes the corresponding Hash-value v, extracts K = KDF(v) € 
Zp, generates the blinded signature (of = h*6*°,o5 = g*), where ô = 
uo [[; bi,3 =F(M)g™*", and sends (hp, Q =o x g*, 04); 

e Upon receiving (hp, Q, c), using its witnesses and hp, U computes the 
ProjHash-value v’, extracts K’ = KDF(v’) and unmasks o// = Q x g-™. 
Thanks to the knowledge of rı and r2, it can compute of = of x 
(o4)-"—". Note that if v’ = v, then of = h*F(M)*, which together 
with of = gê is a valid Waters signature on M. It can thereafter re- 
randomize the final signature o = (o1 - F(M)* ,o4-g* ). 

— BSVerif (vk, M,o), checks whether e(o1, g) = e(h, vk) - e(F(M), 02). 


The idea is to remove any kind of proof of knowledge in the protocol, which was 
the main concern in [5], and use instead a SPHF. This way, we obtain a protocol 
where the user first sends 3¢+ 6 group elements for the ciphertext, and receives 
back 5€+ 4 elements for the projection key and 2 group elements for the blinded 
signature. So 8+ 12 group elements are used in total. This has to be compared 
to 9€+24 in [5]. We both reduce the linear and the constant parts in the number 
of group elements involved while relying on the same hypotheses. And the final 
result is still a standard Waters signature. 


Remark 8. In [I7], Garg el al. proposed the first round-optimal blind signature 
scheme in the standard model, without CRS. In order to remove the CRS, their 
scheme makes use of ZAPs [14] and is quite inefficient. Moreover, its security 
relies on a stronger assumption (namely, sub-exponential hardness of one-to-one 
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one-way functions). A natural idea is to replace the CRS in our scheme with 
Groth-Ostrovsky-Sahai ZAP [21] based on the DLin assumption. This change 
would only double the communication complexity, but we do not know how 
to prove the security of the resulting schemd3. It remains a tantalizing open 
problem to design an efficient round-optimal blind signature in the standard 
model without CRS. 


4.3 Security 
In blind signatures, one expects two kinds of security properties: 


— blindness, preventing the signer to be able to recognize which message was 
signed during a specific interaction. Due to Waters re-randomizability and 
linear encryption, this property is guaranteed in our scheme under the DLin 
assumption; 

— unforgeability, guaranteeing the user will not be able to output more signed 
messages than the number of actual interactions. In this scheme, granted the 
extractability of the encryption (the simulator can know the decryption key) 
one can show that the user cannot provide a signature on a message different 
from the ones it asked to be blindly signed. Hence, the unforgeability relies 
on the Waters unforgeability, that is the CDH assumption. 


Theorem 9. Our blind signature scheme is blind] under the DLin assumption 
(and the pseudo-randomness of the KDF) and unforgeable under the CDH as- 
sumption. 


A full proof can be found in the full version [6]. 
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Abstract. The hash-and-sign RSA signature is one of the most elegant 
and well known signatures schemes, extensively used in a wide variety 
of cryptographic applications. Unfortunately, the only existing analysis 
of this popular signature scheme is in the random oracle model, where 
the resulting idealized signature is known as the RSA Full Domain Hash 
signature scheme (RSA-FDH). In fact, prior work has shown several 
“uninstantiability” results for various abstractions of RSA-FDH, where 
the RSA function was replaced by a family of trapdoor random permu- 
tations, or the hash function instantiating the random oracle could not 
be keyed. These abstractions, however, do not allow the reduction and 
the hash function instantiation to use the algebraic properties of RSA 
function, such as the multiplicative group structure of Z}. In contrast, 
the multiplicative property of the RSA function is critically used in many 
standard model analyses of various RSA-based schemes. 

Motivated by closing this gap, we consider the setting where the 
RSA function representation is generic (i.e., black-box) but multiplica- 
tive, whereas the hash function itself is in the standard model, and can 
be keyed and exploit the multiplicative properties of the RSA function. 
This setting abstracts all known techniques for designing provably se- 
cure RSA-based signatures in the standard model, and aims to address 
the main limitations of prior uninstantiability results. Unfortunately, we 
show that it is still impossible to reduce the security of RSA-FDH to 
any natural assumption even in our model. Thus, our result suggests 
that in order to prove the security of a given instantiation of RSA-FDH, 
one should use a non-black box security proof, or use specific properties 
of the RSA group that are not captured by its multiplicative structure 
alone. We complement our negative result with a positive result, showing 
that the RSA-FDH signatures can be proven secure under the standard 
RSA assumption, provided that the number of signing queries is a-priori 
bounded. 
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1 Introduction 


Bellare and Rogaway, [3], introduced the random oracle (RO) model, as a 
“paradigm for designing efficient protocols”. When following this paradigm, 
one first builds a provably secure scheme assuming that an access to a ran- 
dom function is given, and (possibly) assuming some “standard” hardness as- 
sumption (e.g., factoring is hard). Then it instantiates the scheme by replacing 
the random function with some concrete “hash function” (e.g., SHA-1). The 
intuition underlying this paradigm is that a successful attack on the resulting 
scheme should indicate (unexpected) weaknesses of the hash function used. This 
paradigm (also known as the random oracle heuristic) has led to several highly ef- 
ficient and widely used in practice constructions, such as the RSA Full Domain 
Hash signature scheme (RSA-FDH) [3] and RSA Optimal Asymmetric En- 
cryption Padding scheme (RSA-OAEP) [4]. Typically, however, little is known 
about the provable security of such popular schemes in the standard model. In 
particular, it is unknown whether we can reduce their security to some “natural” 
assumption. 

In this work we revisit this question once again, focusing, in particular, on 
the instantiability of the RSA hash-and-sign signatures. The RSA signature [31] 
is one of the most elegant and well known signatures schemes. It is extensively 
used in a wide variety of applications, and serves as the basis of several existing 
standards such as PKCS #1 [82]. In its “textbook” form, the signature ø of the 
message m is simply o = m? mod n, which can be verified by checking if o° = m 
mod n, where e is the public RSA exponent, and d = e~' mod ¢(n). Of course, 
the textbook variant is completely insecure, as any ø is a valid signature of 
some message m = o° mod n. The traditional fix, known as RSA hash-and-sign 
signature, is to hash the message m before signing it using some “appropriate” 
hash function A (i.e., ¢ = h(m)? mod n). The key question is how to instantiate 
this function h? 

Bellare and Rogaway, [3], showed that in the random oracle model, where h is 
modeled as a truly random function (freely available to all the parties including 
the adversary), the resulting RSA hash-and-sign signature (which they called 
RSA Full Domain Hash, for short, RSA-FDH) is secure assuming that the 
(standard) RSA assumption holds. When considering an actual instantiation 
of h, though, a moment’s reflection shows that all known security notions for 
hash functions, such as collision-resistance or pseudorandomness, do not appear 
to help. In fact, even more “esoteric” notions, such as perfect one-way hash 
functions or verifiable random functions [5], are not sufficient either. On the 
other hand, no significant attacks on RSA-FDH signatures are known when h 
is instantiated using popular “cryptographic hash functions”, such as SHA-1. 
This gave rise to the following important question, which is the main focus of 
this paper. 


Is there an instantiation of RSA-FDH signature scheme (namely, of the 
hash function h) that can be proven secure under a natural assumption 
in the standard model? 
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Of course, for any concrete hash function, one can “reduce” the security of RSA- 
FDH signatures to that of RSA-FDH signatures, which is not very useful. So it 
is important that the assumption used to argue the security of the scheme should 
be considerably simpler than the chosen message attack on RSA signatures. The 
best case scenario would be a reduction to the one-wayness of the RSA function 
(i.e., the standard “RSA assumption”), which is indeed what happened in the 
idealistic RO model. Unfortunately, we seem to be very far from this goal. In 
fact, several works, which we survey next, showed various arguments suggesting 
that no such reduction is likely to exist. 


Existing Impossibility Results. It is well known that in the general case 
the random oracle heuristic is false. Specifically, there exist schemes secure in 
the random oracle model that cannot be instantiated by any concrete hash func- 
tion [B6082]. Most counter-examples of this kind, however, are rather ar- 
tificial, and do not shed much light on the security of concrete schemes used 
in practice. The work that seems most relevant to the focus of this paper is 
those of [I2] and [27] described below (whereas other related work is discussed 
in Seen | 

Dodis et.al., [12], considered a generalization of RSA-FDH signatures, known 
as (general) Full Domain Hash (FDH) signatures. In such signatures, the signer 
has access to an arbitrary trapdoor permutation f, and sets o = Fhmn 
The main result of [I2] rules out proving the security of an instantiation FDH, 
by reducing it to the one-wayness of f (or more generally, to any assumption 
on f that is satisfied by a random trapdoor permutation). Their result, how- 
ever, does not capture reductions that use additional assumptions about f. In 
particular, it seems likely that if a proof of security of some instantiation of 
RSA-FDH does exist, then it would use the algebraic properties of the RSA 
function. To demonstrate this point, we present (see Section [LI an instantia- 
tion of RSA-FDH under the standard RSA assumption, that is secure as long 
as the number of signing queries is a-priori bounded f] Our reduction is black 
box, and critically uses the algebraic properties of Z*. (Indeed, [12] showed that 
even one-time security of general FDH signatures cannot be black-box reduced 
to the one-wayness of the trapdoor permutation.) In addition, the “RS A-based” 
signatures [L61022], which can be proven secure in the standard model (but, 
alas, no longer have the simple syntax of the RSA signature), critically use 
the algebraic properties of the RSA function. Finally, even in the random or- 
acle model, tighter security bounds are sometimes achieved using the algebraic 
properties of RSA (cf., [9], as compared to the generic proofs from trapdoor 
permutations [3[13]). 

More recently, Paillier, [27], looked at the question of instantiating RSA-FDH 
using a fixed hash function (as opposed to a keyed family), and showed that no 
such instantiation can be black-box reduced to the traditional RSA assumption, 


1 As in the case of RSA-FDH signatures, FDH signatures are known to be secure 
when the hash function is modeled as a truly random function [3]. 
? With a different motivation, the same result was independently obtained by [21]. 
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assuming the so called “RSA non-malleability” assumption. Informally, this as- 
sumption states that calling the RSA inverter on arbitrary “permitted” inputs 
(n’, e’) Æ (n,e) does not help in breaking the instance (n, e). We remark that, as 
observed by Paillier in [27], this assumption is false for various reasonable interpre- 
tations of “permitted” tuples (n’,e’). More significantly, although the restriction 
to a fixed hash function h is consistent with the existing use in practice, from a theo- 
retical perspective this assumption is somewhat restrictive. For example, while the 
result of rules out proving even one-time security of RSA-FDH, our positive 
result (see Section [1.1) circumvents this impossibility result by using a keyed hash 
family. 


1.1 Our Results 


Our main result is a new negative result regarding the instantiability of RSA- 
FDH, which addresses some of the limitations of the previous negative results 
of [12127]. To motivate this result, we start by describing our already mentioned 
positive result. 


Theorem 1 (Informal). Under the standard RSA assumption, for every poly- 
nomial t there exists an instantiation of RSA-FDH that is existentially unforge- 
able against t(k) signing queries (where k is the security parameter). Further- 
more, the reduction treats the group Z*, and the potential adversary in a black-box 
way. 


The claimed construction is fully described in the full version [], but here we 
highlight some of its features. First, the result on works for bounded values of t, 
since the constructed hash function description length, is polynomial (quadratic) 
in the number of signing queries. Second, our construction uses a keyed family 
of hash functions (which is needed to overcome the impossibility result of [27]). 
Third, the hash function depends on the RSA modulus n and critically uses the 
multiplicative structure of the RSA function (which is needed to overcome one 
of the impossibility result of [12]). Finally, our reduction does not use any other 
properties of the RSA function besides its multiplicative homomorphism over 
Zy. Formally, this means that the reduction works given only oracle access to 
the multiplication and the inversion operations of Z*. 

We now turn to our main, negative result, which can be informally stated as 
follows: 


Theorem 2 (Informal). It is impossible to reduce the security of an instantia- 
tion of RSA-FDH to a “natural” assumption (and in particular to the hardness 
of RSA), provided that (1) the reduction treats the potential adversary in a 
black-box way; (2) the public exponent e used by the scheme is prime with non- 
negligible probability; (3) the instantiation only “uses the multiplicative properties 
of Z> ”, and should “relativize” to any group isomorphic to Z% . 


n? 
We now explain this result in more detail. First, our result holds even if the hash 


function h is allowed to be keyed, and, moreover, to depend on the RSA modulus 
n (which was used in our positive result). More significantly, we allow both the 
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hash function and the hypothetical security reduction R to use the multiplicative 
structure of Z*. Finally, we not only rule out reductions to the standard RSA 
assumption, but also to other non-interactive “RSA-type” assumptions, such as 
the “strong RSA assumption”. 

However, our result also has three limitations, (1)-(3). First, and least impor- 
tant, is the assumption that the reduction must treat the adversary in a black-box 
way. This limitation is met by most existing reductions, and also quite standard 
in most black-box impossibility results. Technically, it means that the reduction 
should work given oracle access to any (even inefficient) attacker breaking the 
security of RSA-FDH. Second, and more significant, is the fact that our current 
proof relies on the fact that the instantiation will use a prime exponent e (at 
least with non-negligible probability). Although this limitation appears to be an 
odd artifact of our specific proof technique, and also seems to be met by most 
known RSA instantiations, it does leave a possibility for a secure RSA-FDH 
instantiation always using some composite exponent e. Finally, and most signif- 
icantly, we assume that the reduction “treats the multiplicative RSA group Z% 
in a black-box manner”. This is formalized (see Section B) using the notion of 
generic groups [B3253]. Informally, though, it means that nothing is assumed 
about a group element, apart from what was revealed through the performed 
group operations (i.e., multiplication, inverse and equality check). In particular, 
an algorithm that treats Z* in a black-box way should perform equally well given 
oracle access to any group isomorphic to Z (without knowing the isomorphism). 

With this intuition in mind, we can interpret Theorem [d as an indication that 
in order to prove the security of a given instantiation of RSA-FDH, one should 
use a non-black box security proof, or use properties of the RSA group, that 
are not captured by the generic group abstraction. To the best of our knowledge, 
all known positive results on building “RSA-type” signatures — including our 
new positive result in Theorem[il, the standard model constructions of [L61022], 
and the random-oracle based analysis of [B]9] — treat Z* as a black-box, and 
only use its multiplicative structure. Thus, although still restrictive, our result 
rules out all known techniques for proving the security of RS A-based signatures, 
which was not the case for the previous results of [[2]27]. Still, the restriction of 
the reduction to only use the multiplicative structure of Z* is quite significant, 
which raises the question if this restriction could be relaxed. 


Removing Generic Groups? Unfortunately, removing (or even relaxing) the 
above mentioned restriction appears to be very challenging. Intuitively, with our 
current techniques (see more below) we must be able to construct an algorithm 
Forger which, given any (family of) hash function(s) h, should be able to (1) break 
the RSA-FDH instantiation using this h, and, yet, (2) do so by only forging the 
signature which the reduction R must already “know” (so that Forger never 
helps R compute something which R does not know to begin with, potentially 
helping R to break some hardness assumption). In particular, satisfying conflict- 
ing properties (1) and (2) seems to require some kind of “reverse-engineering” 
(or “de-obfuscation” ) techniques on h which seem to be completely beyond our 
current capabilities, without placing any restriction on the reductions we allow. 
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Indeed, the introduction of the generic group model was precisely the step which 
(a) allowed our forger to “reverse engineer” the given hash function h (so as to 
provably satisfy properties (1)-(2) above), and, yet, (b) allowed the reduction to 
use the algebraic properties of Z*. 


1.2 Our Technique 


On a very high level, our proof follows the approach of [12] used to prove that 
there exists no fully black-box reduction from (general) FDH signature schemes 
to the one-wayness of random functions. defined an oracle Forger relative 
to which no FDH signature scheme is secure, yet Forger does not help inverting 
a random function. In more detail, on input (h, {oi }iejy), Forger checks that 
(1) {o;} are valid signatures for the messages 1,...,t (i-e., f(o;) = h(i) for every 
i € [t], where f is the random function), (2) the evaluation of h(1),...,h(t) 
does not query f on any element of {o;}, and (3) t is at least equal to |h] 
— the description size of h. If positive, Forger returns the signature of 0 (i.e., 
f-1(h(0))). 

It is clear that Forger can be used to break the existential security of any FDH 
scheme: the attacker uses Sign, the signer of the scheme, to compute {a;}ie(¢ for 
some t > |h|, and then calls Forger on (h, {o;}), where we assume without loss of 
generality that condition (2) above holds with respect to this query (otherwise, 
faking a signature without Forger is easy). On the other hand, [I2] showed that 
an efficient algorithm (with oracle access to f, but not to Sign) cannot provide 
all these signatures. Thus, Forger is useless in these settings, and in particular a 
black-box reduction (i.e., algorithm) cannot make use of Forger for inverting a 
random function, proving the main result of [12]. 

Intuitively, Forger is useless for an algorithm with no access to Sign, for the 
following reason. Fix some efficient oracle-aided algorithm R and let {0,1}" be 
the domain of the random function f. Since a random function is one way, 
the only elements that R can invert are those elements it previously received 
as answers to its f-queries. Hence (since f is random), R only knows how to 
invert random elements inside {0,1}". Since it takes at least t bits to describe t 
random elements in {0,1}” (actually, it takes tn bits) and since the evaluation 
of h(1),...,h(t) does not query f on elements inside {o;},¢(j, there must exist 
h(i) € {h(1),...,h(¢t)} that R does not know how to invert, and thus cannot 
provide a valid signature for the message i. 

Moving to our setting, we focus for concreteness on fully black-box reductions 
from RSA-FDH to the hardness of RSA (i.e., such reductions use the multiplica- 
tive RSA group Z* and the adversary in a black-box way). The blackboxness in 
the RSA group tells us that such a reduction should work with respect to any 
group isomorphic to Z*. In particular, it should work well with respect to the 
group 7(Z*), obtained by renaming the elements of Zš according to a random 
permutation m over Z* (i.e., a-b is defined as m(n! (a): ~1(b) mod n)). 

Given the above understanding, the first attempt would be to define Forger 
analogously to that of [12]. Namely, on input (n,e,h, {oi}iefy), Forger checks 
that (1) of = h(i) for every i € [t], (2) the evaluation of h(1),..., A(t) does 
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not compute g; for some i € [t], and (3) t > |A|. If positive, Forger returns the 
signature of 0 (i.e., h(0)%, for d = e7! mod ¢(n), where all group operations are 
over the group 7(Z;,). 

We would like to argue that if m is chosen at random, then the only way to 
make a non-aborting query to Forger is via using Sign, the signer of the scheme. 
It would then follow that Forger is useless for an algorithm R that has no access 
to Sign (and in particular to a black-box reduction). It turns out, however, that 
in our settings such R can make non aborting calls to Forger. The issue is that 
unlike in the setting of [12], R can make use of the algebraic structure of Z* to 
construct a non-aborting query to Forger. For instance, R can compute {j°} je (4, 
and assuming some reasonable mapping M from [t = &] to {j- k}j neg, let 
h(i) = M(i)* mod n and o; = M(i). Since the evaluation of h(1),..., h(t) does 
not query an element of {oj}ie[x), it follows that (n,e,h, {oi}iefy) is a non- 
aborting query] Alternatively, if R can break the RSA assumption over 7(Z* ) 
(say, if it knows the factorization of n), then it can set h(i) = i and compute 
gi = h(i)? (using the factorization of n to compute d). 

Fortunately, we manage to prove that a non-aborting query of R is either 
“degenerated” (as in the first example) or indicates that R knows the factor- 
ization of n. To handle the first case, we change Forger to identify and abort 
on degenerated queries. Where we also show that it is easy to forge a signature 
with respect to a degenerated h (i.e., h that is part of a degenerated query), 
even without the help of Forger. Namely, we show that there is no secure RSA- 
FDH scheme relative to the modified Forger. We then show that with respect 
to this modified Forger, one can efficiently extract the factorization of n from an 
algorithm that produces a non-aborting query. It follows that for any efficient 
algorithm R with oracle access to Forger, there exists an efficient algorithm, with 
no access to Forger, that emulates RF°€* well. In other words, we prove that 
Forger is useless for the class of efficient algorithms with no oracle access to Sign. 

Proving the above intuition is the main challenge of this work, and we achieve 
that using a novel adaptation of the Gennaro-Trevisan, [17], short description 
paradigm, described below, to the generic groups realmid 


The Gennaro-Trevisan Short Description Paradigm and Its Adap- 
tion to Generic Groups. Loosely, shows that an efficient algorithm that 
inverts a random function too well, can be used to give a too short description for 
a random function (and thus cannot exist). This elegant approach has turned 
to be an extremely powerful approach for proving impossibility results in the 
random functions realm, which typically imply black-box impossibility results 
for one-way functions/permutations based constructions. While the Gennaro- 
Trevisan paradigm (from now on, the GT paradigm) has several extensions (e.g., 
[T5J35)19]20]30} ), all are given in the random functions realm. 


3 Note that to describe h it suffices to describe the set {7°}je,e). Thus |h| € O(¢log n), 
which is smaller than t for large enough £. 

4 A side benefit of this proof technique, is an alternative proof to the equivalence of 
RSA and factoring over generic groups, firstly proven by Aggarwal and Maurer, [I] 
(the latter, however, also proves it over “generic rings”). 
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We would like to apply a similar approach for arguing that an algorithm that 
makes a non-aborting query to Forger, can be either used to factor n, or to 
“compress” the random permutation m (which defines the group 1(Z*,)). Since 
compressing 7 is impossible, it follows that a non-aborting query of such an 
algorithm can be used to factor n. Hence, such queries can be answered efficiently, 
yielding the existence of an efficient emulator (without access to Forger) for any 
efficient algorithm f] 

Extending the GT paradigm to our settings involves many complications. The 
main part of the GT paradigm is using the (hypothetical) attacker to reconstruct 
a random function using (too) short advice. This reconstruction involves emulat- 
ing the attacker, where the key point is to do this without “wasting information” : 
any bit used to emulate, should give a bit of information about the (random) 
function. Doing the latter is quite easy for random functions; the answer to any 
query of the attacker gives the same amount of information about the function 
(i.e., the info that it maps the query input to the provided output). The only sub- 
tlety is that there are repeated queries (which are clearly wasteful), but handling 
such queries is easy: simply keep track of the query history on the emulation. 

In our setting, however, things get much more complicated. To begin with, 
there might be non-repeating queries whose answers yield very little informa- 
tion about the random group 7(Z*) (and therefore about 7). For instance, for 
some n’s there are only four possible answers for the query a®(")/4 over 1(Z*). 
Thus, roughly speaking, the answer for this query contains only two bits of in- 
formation about m. More generally, it appears that one can create much more 
intricate examples; e.g., when the answer to the query follows a very complicated 
distribution, based on the answers given so far. 

An even more challenging task is proving the dichotomy that a non-aborting 
query can either be used to (efficiently) factor n, or implies a (too) short de- 
scription of 7. Handling the above challenges requires an intimate understanding 
of the algebraic structure of the group Z*, in particular of the set of solutions 
for linear equations over this group, and critically uses the fact that factoring is 
solvable in sub-exponential time [11J34]. 


1.3 Other Related Work 


We briefly mention other known results concerning the uninstantiability of pop- 
ular signature and encryption schemes that can be proven secure in the random 
oracle model. Paillier and Vergnaud, [28], showed that many popular discrete 
log based signatures (including ElGamal, DSA and Schnorr) cannot be reduced 
to the discrete log assumption in the standard model, using the so called “alge- 
braic” reductions. (Similar results also hold for related GQ signatures under the 
RSA assumption.) Although technically incomparable to our “generic group” 
modeling, conceptually such reductions are related to our assumption that the 


5 In addition, since non-aborting queries are easy to generate assuming that RSA 
is easy over 7(Z7,), the above would immediately yield that RSA is equivalent to 
factoring over (random) 7(Z;,), and thus over generic groups. 
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reduction can only use the multiplicative structure of a given group. Indeed, 
in both cases the “meta-reduction” can eventually figure out the multiplicative 
relations used be the reduction R in its queries to the attacker. The main dif- 
ference applies in the way the reduction can prepare its queries to the attacker. 
While the generic group modeling allows the reduction R to use some “hidden 
values” related to the assumption that R is trying to break, “algebraic” reduc- 
tion do not allow this flexibility. Thus, much of the technical difficulties in the 
generic group modeling (e.g., extracting the hidden representations computed 
by the reduction “on the side”) are somewhat trivialized when restricted to “al- 
gebraic” reductions. Additionally, the results of [28] are specific to reductions 
from a concrete assumption (e.g., discrete log), and are conditional on another 
assumption (e.g., “one-more” discrete log). In contrast, our results are uncon- 
ditional and rule out all starting assumptions, but only in the generic group 
model. 

Finally, in the realm of factoring/RSA-based CCA encryption, Paillier and 
Villar, [29] and Brown et.al., [6], showed uninstantiability results analogous to 
already-mentioned RSA signature result of [27]. 


Paper Organization 


In Section [J we formally define RSA-FDH and its security in the generic group 
model and the type of reductions we rule out. Our main result, regarding the 
impossibility of existentially unforgeable RSA-FDH against unbounded number 
of signing queries, is proven in Section|3. However, the proof of our main technical 
lemma using the GT short description paradigm is omitted and can be found in 
the full version ; 


2 RSA-FDH in the Generic Group Model 


In the following we first formally define what we mean by generic group model, 
then extend the standard definitions of RSA-FDH to this model and finally 
define weakly black-box proofs of security. 


2.1 The Generic Group Model 


There are different ways to interpret what it means to “treat the multiplicative 
RSA group Z* in a black-box way” (see Theorem B. In the generic algorithm 
model due to Maurer, [23], “generic” algorithms do not have a direct access to 
the group elements, but rather to a “black box” containing each element. The 
only operations allowed with these boxes, are the group operations (inverse and 
multiplication) and comparing two boxes for equality. The formulation we have 
chosen here, which we simply call the generic group model, is somewhat less 
abstract. An algorithm in our model has an oracle access to a group isomorphic to 
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Z* (specifically, the group resulting by renaming the elements of Z* according to 
some random permutation), through which it can perform the group operations. 
Unlike the generic algorithm model, however, in our model algorithms we do 
have access to the representation of the group elements and can manipulate 
them. 

Since any algorithm that “works well” in the generic algorithm model (e.g., 
breaks the RSA assumption) implies an algorithm that works equally well in 
our model with respect to any group isomorphic to Z*, an impossibility result 
in our model implies a similar result in the model of Maurer, [23]. Namely, our 
model can be viewed as a model for proving impossibility results in the generic 
algorithm model. 

We formally define our model as follows: for n € N, let gin) be the set of all 
permutations from Z} to Z. For m € Ign), we denote with 7(Z*) the group 
induced by the group Z* where each element of Z* is renamed according to 
m. More specifically, the group operations over 7(Z*) are defined as follows: the 
inverse of a € Z* is t((7~'(a))~+ mod n) and the (group) product of a,b € 7(Z*,) 
is m(a~!(a) - w~1(b) mod n). By II(Z*) we denote the multiset of all groups 
m(Zy,), where G = {G = {Gn: Gn E€ H(Z*)}nen} (i.e., G consists of sets of 
groups, where each set contains a group of IT(Z*,) for every n € N). 

Abusing notation, we view G € G as an oracle that given as input n € N and 
one [resp., two elements] of Gn (i.e., of Z*), returns the group inverse [resp., 
the group product] of the element (if the oracle G is given as input an element 
outside Gn, it returns L), and let G,,(-) = G(n,-). Given a sequence of group 
operations (e.g., a- b~+), we sometimes add the term [Gn], to indicate that 
the operations are done with respect to the group Gn. In the following, abusing 
notation again, we will write G «+ G, where this sampling is not well defined 
because G is an infinite set. However, we can assume lazy sampling, namely for 
every query which contains a new n, Gn is sampled uniformly at random from 
IT(Z*) (which is a finite set). 


2.2 RSA-FDH Signature Schemes in the Generic Group Model 


RSA-FDH signature schemes over G € G is defined as follows: 


Definition 1 (RSA-FDH signature scheme in the generic group 
model). An RSA-FDH signature scheme XY in the generic group model, con- 
sists of the following triplet of oracle-aided PPT ’s (KeyGen, Sign, Verify): 


— Given oracle access to G € G and input 1", KeyGen® outputs a “public key” 
(n,e,h), where n € N is a product of two primes, e € Zin) and h is a (hash) 
function, represented as an oracle-aided circuit mapping values into Z% , and 
a “secret key” d = e—! mod ¢(n). 

— Given oracle access to G € G, inputn eN, d€ Lion)? a circuit h mapping 


values into Zy and a “message” m in the domain of h, Sign? outputs the 
“signature” hE (m)? [Gn]. 
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— Given oracle access to G € G, inputn EN, e€ Zisin)? a circuit h mapping 


values into Zy, a “message” m in the domain of h and o € Z*, Verify? 
outputs one iff o° = hC (m) [Gn]. 


For G €G, we let XC be the instantiation of XS with G. 


Security Definition. The following definition realizes the security of bounded 
and unbounded existential unforgeability under chosen message attack of an 
RSA-FDH signature in the generic group model, analogously to that of the 
standard model. 


Definition 2 (security of RSA-FDH signature in the generic group 
model). An oracle-aided algorithm F breaks the security of an RSA-FDH 
signature scheme SY = (KeyGen, Sign, Verify), if 


zong : 
PrGeg,(sk,pk)+KeyGen€ (1%) [(m, a) — poe (sk,pk, ) (pk): 


Verify? (o, m, pk) = 1 A Sign was not queried on m] > neg(k) 


A signature scheme 9 is EU-CMA-secure, if no (oracle-aided) PPT breaks its 
security, where SY is t-EU-CMA-secure, if no PPT breaks its security when 
restricted to query Sign at most t(k) times. 


2.3 Weakly Black-Box Proofs of Security 


Since we would like to rule out an EU-CMA-secure scheme, we ask the security 
proof of the scheme to be realized via a “black-box reduction” (as discussed 
in the introduction, we have very little chance to rule out a general proof of 
security). On the other hand, we consider a very weak form of such a reduction 
(which strengthens our main impossibility result). 


Definition 3 (weakly black-box proof of security of RSA-FDH). An 
RSA-FDH signature scheme SY = (KeyGen, Sign, Verify) in the generic group 
model has a weakly black-box proof of security based on an assumption X, if there 
exists an oracle-aided PPT R such that if X is true, then the following holds: let 
F be a (possibly unbounded) adversary that breaks the security of XF (see Def- 
inition ch then for any PPT Emul there exists a polynomial-length distribution 
ensemble D = {Dk }ken such that 


SD (ane? OS a), (x, e E > neg(k) E] 


Remark 1 (A black-box proof implies a weakly black-box proof). Assuming that 
X is true, the above intuitively asks that a security breach of YY implies that a 


° Note that F is an adversary which expects oracle access to Sign and R can control 
the responses of these queries of F. The same does not hold for the queries of F 
to G. 
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(slightly) non-trivial task can be performed. Specifically, an efficient oracle-aided 
algorithm can use a breaker of the scheme (in a black-box way) to sample some 
unsamplable distribution. Note that this is a very modest demand and indeed, 
it is implied by most black-box proofs of security one can think of. 

Consider for instance a proof of security R that black-box reduces the security 
of a scheme YY to an assumption X, say to the hardness of factoring. It follows 
that given any adversary F to DY, the algorithm R£ 7 factors integers too 
well. Assume without loss of generality that R£] (zx), if succeeds, outputs the 
factorization of the integer x, let Dp be the distribution that outputs an integer 
x = pq, for two randomly chosen k-bits prime, and consider the distribution €k = 
(a, ROOFS (1#, £))Gg,z+D, it induces. Now if factoring is hard, then there is no 
efficient Emul such that (x, Emul@(1",«))@-g,2<-p, is (even computationally) 
close to p. Namely, there is no weakly black-box proof of security for 39 based 
on factoring. 

Now if factoring is hard, then there is no efficient Emul such that 
(x, Emul@(1*, 2))e-¢,2p, is (even computational) close to êp. Namely, there 
is no weakly black-box proof of security for 39 based on factoring |4 


3 There Exists No RSA-FDH with a Weakly Black-Box 
Proof 


In this section we prove the main result of this paper. 


Theorem 3 (Theorem B, restated). Let Xf = (KeyGen, Sign, Verify) 
be an RSA-FDH signature scheme in the generic group model in which 
Preeg,(n,e,h)¢-KeyGen@(1*) le € P] > neg(k). If XS has a weakly black-box proof of 
security based on (an assumption) X, then X is false. 


The proof of T heorem [3 immediately follows from the next lemma: 


Lemma 1. Let SY be as in Theorem El then there exist a family of oracles 
Forger = {Forgerg}aeg and oracle-aided PPT’s F and Emul, such that the fol- 
lowing hold: 


1. For every G € G, FCFoesre breaks the security of XS. 
2. For any oracle-aided PPT A and polynomial-length distribution ensemble D = 
{Dk}ken: 


SD ((2, ASF”eeta (1¥, x)), (z, Emul (1*, z, dese(A)))) = neg(k), 
G+G,x+ Dk 


where desc( A) denotes the description of the Turing Machine A. 
Before proving Lemma i, let us first use it for proving Theorem B. 


T Note that there nothing specific to the hardness of factoring in the above discussion, 
but rather it seems to be generic to “any” hardness assumption (e.g., strong RSA). 
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Proof (of Theorem (4). Let YY be an RSA-FDH scheme with 
Praeg,(n,e,h)eKeyGen@(1k)le E€ P] > neg(k). Assume that XF has a weakly 
black-box proof of security based on (an assumption) X and let R be the 
algorithm guaranteed by this proof. Let Emul be the algorithm guaranteed by 
Lemma [i] 1| with respect to YY. Lemma [i] Jj yields that 


SD ((z, RG Foreerc (1* g), (x, Emul@(1*, x, dese(f))) = neg(k) 
GtG,a+ Dk 
for any polynomial-length distribution ensemble D = {Dx}, where 
RO Foreera (.) = ROE (A, Letting FC(-) = FOForere(.) and EmulG(.) = 
Emul®(-, desc(R)), it follows that 


G.FS (74k Giyk = 
SD ((2, ROP" (1,0), (z, Emulf(12))) = neg(h) 


for any polynomial-length distribution ensemble D, yielding that X is false. 


The rest of this section is devoted for proving Lemma til We find it more con- 
venient, however, to prove a variant of Lemma [I in which the emulator should 
work for any (polynomial-size) family of circuits. Namely, we prove the follow- 
ing lemma (in the following statement we only focus on the part that changed 
comparing to the original statement): 


Lemma 2 (non uniform variant of Lemma [E 


2. The following holds for any (no input) polynomial-size family of oracle-aided 
circuits {Cy} een: 


SD (a Pere Emul (1¥, dese(C))) LAs neg(k), 


where oe denotes the output of Ck given access to G and Forgerg, 
and desc(C;,) denotes the description of Cp. 


It is easy to see that the non-uniform lemma above yields the uniform Lemmaly] 
In Section 3.1] 1| we define the family of oracles Forger and the efficient algorithm 
F that uses Forger to break any RSA-FDH scheme, in Section B.3) we define the 
emulator Emul, where in Section BA we put things together to prove Lemmal. 


3.1 The Forger 


Recall (see Section [L2 that Forger has to abort on “degenerated queries” — 
essentially those queries a are easy to de over any group in J7(Z*). T 

determine whether a query (n, e, h, {ai}ie[x]) is degenerated, we measure the com- 
plexity of the values {h(i) (i) reg Ela 8| as a function of the group queries done through 


3 We actually mean {h@(i)};ej4, but for notational convenience we will sometimes 
omit the superscript G from h. 
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their evaluations. Since the actual representation of these values is meaningless, 
we only focus on their representation as functions of the “hardwired terms” — 
the values used in the evaluation of {h(i)} that first appear as an input to a group 
oracle call. Note that any group element used in the evaluation of {h(i)}, can be 
expressed using (only) these hardwired terms. To formally carry the above dis- 
cussion, we describe the evaluation of {h(z)} as a computation over the following 


group. 


Definition 4 (The group Symb). The elements of Symb are equivalence 
classes over the set of all finite strings “uï, ,ug*”, where the u;’s are in N 


> . al a, 
and the a;’s are in Z. The strings c = “u{?- and d = “uh teth E” 


in the same equivalence class, if for every w E N it holds that Jiel 


akn 


-Uk are 


k]: u;=w Qi = 
Diepe: waw%- We identify a group element of Symb, with any string of its 
equivalence class. The unit element of Symb is the class identified by the empty 
string e (or by 21-271” etc), where c-c! is the equivalence class identified by the 


string “c-c” and finally ct} is the class identified by the string “uy“'+....u,°"”. 
We naturally identify an element “uf? -...- ug*” E€ Symb with an element of 


a given group V that contains {ui}iejj, by identifying it with the result of 
the sequence of operations it induces over V (i.e., “ui - ug!” with respect to 
V = Z*, is identified with u1 - uz’ mod n). To avoid confusion over which 
group a sequence of operations is taken, we typically suffix the sequence with 
the term [V], indicating that it is done over the group V. It is clear that for 
any two strings u and u’ that identify the same element of Symb (i.e., belong to 
the same equivalence class), it holds that u = u’ [V] for any Abelian group V 
containing u and wu’. 

Next we use the above terminology to syntactically describe the computation 
of an oracle-aided circuit C, where we start by defining the hardwired terms deter- 
mined by C’s computation. To simplify notations, we assume that a circuit evalu- 


ates its gates one-by-one, and that its description determines this evaluation order. 


Definition 5 (hardwired terms). Let C be an oracle-aided circuit, G € G 
and n € N. The terms of C with respect to Gn, denoted Termsc,G,n, are those 
values that appear either as input or as the answers to non-bottom queries of 
C to Gn (ie., Gn returns a non-bottom value). The hardwired terms of C with 
respect to Gn, denoted HardWiredc,G,n are those element inside Termsc,G,n that 
first appear as inputs to non-bottom queries to Gn. Finally, the answer terms 
are those terms that appear as answers to non-bottom queries (might intersect 
HardWiredc,g,n). We assume that the elements of each of the above sets are 
ordered according to the evaluation order. 


We next use the syntax of the group Symb, to present any term as an expression 
of the hardwired terms. 


Definition 6 (canonical form). Let C, G and n be as in Definition [J. The 
canonical form of u € Termsc,G,n with respect to (C,G,n), denoted Canc.g,n(u), 
is recursively defined as follows: 
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— if u € HardWiredc,a,n, let Canc,g,n(u) be the element “u!” € Symb. 

— If u first appears as an output of a query Gn(u', u”), let Canca, nlu) = 
Canc,e,n(u’) - Canc,G nlu”) [Symb]. 

— Similarly, if u first appears as an output of G,(u’), we let Canc,G nlu) = 
Canc,G,n(u’) 7! [Symb]. 


Let {vi}ie) = HardWiredc,G,n. Note that the canonical form of any 
u € Termsc.gn with respect to (C,G,n), can be uniquely written as 
Iiep v; [Symb], where a; might be non zero, only if the hardwired term v; 
appears before u does (in the evaluation order of CC). Finally, the canonical 
forms of a set of terms, with respect to (C, G, n), is compactly represented using 
the following matrix. 


Definition 7 (canonical-form matrix). Let C, G andn be as in Definition, 
let {vi Jic = HardWiredco,e,n and let W = {ui }icj] E Termsc,G,n- The matrix 
M&"°(W) € Zixe is defined as {aij tieit jejeg assuming that Canc,G,n(ui) = 
Leg vj” [Symb] for every i € [t]. 


We actually care for the rank of the canonical-form matrix of the terms output 
by a circuit C, which shows if there exists an output term which can be expressed 
as a product of powers of the other output terms. This would imply that if we 
know the e-th roots of the latter then we can compute the e-th root of the former. 
Jumping forward, we will exploit this property of the canonical-form matrix to 
see if a query is degenerated. 

We are finally ready to define Forgerg. 


Algorithm 4 (Forgerc) 

Input: q = (n,e, h, {oi}iefy), where n, e and {oi}ief are integers, and h is an 
oracle-aided circuit. 

Operation: 


1. Ife ¢ P, |A|(= |desc(h)|) > t or for some i € |t] h(i) ¢ Z* or 
hE (ijo! [Gr], return L. 

2. Let M = M@"-"({h(i)}iey) according to Definition [}, where H is the 
oracle-aided circuit that first evaluates h@(1),...,h@(t) and then queries Gn 
on the answers (say asking for their inverses). 

If rankeM < t, return L. 
3. Return (h@(0))? [Gn], where d= e7! mod ¢(n). 


That is, Forgerg first checks that {0; Jej] are valid signatures for the messages 
{1,...,¢} (with respect to G and the public key (n,e,h)) and that forging a 
signature for this public key is not easy (reflected by ranke M = t). If satisfied, 
Forgerg forges a signature for 0. 

Below we describe the PPT F that uses Forgerg for breaking the security 
of XE. 
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3.2 The Breaker F 


The strategy of the algorithm F that uses Forger for breaking the security of 
XC is simple: on input (n,e,h) it would like to use Forger on (n,e,h,{o; = 
Sign? (n,e, 7) }ieft)) to forge the signature of 0. It might be the case, however, 
that Forger returns bottom on such input. Hence, F first checks by himself 
(without using Sign or Forger) whether Forger will return bottom on this input. 
If positive, it uses a straightforward approach (see below) for forging a message 
k € [t], without using Forger at all. 


Algorithm 5 (F) 

Input: pk = (n,e, h) 

Oracles: G € Gn, Sign? (sk, pk,-) and Forgerg. 
Operation: 


1. Lett = |h| and let M = M%"-"({hG(i)}iefy) according to Definition i, 
where H is as in Algorithm (with respect to this h and t). 
2. If ranke(M) = t, return Forgerg(n,e, h, {Sign® (sk, pk, i) Jiet). 
Otherwise, 
(a) Using Gaussian Elimination find k € |t] and a set {Ai € [e] jiette} 
such that for every j € |] it holds that My; = Dict} ài: Mij mod e. 


(Mnj—Diete Ai Mi; )/e 
(b) Let y = Leg vj i ei [Gr], where {vije = 


HardWiredy,G,n (see Definition (4). 
(c) For every i € |t] \ {k}, let o; = Sign? (sk, pki) (= h(i)! [G)]). 
(d) Return or = 7: Tlietyy cay o~ [Ga]. 


The following lemma is immediate, but its proof is omitted and can be found 
in the full version of this paper : 


Lemma 3. For every GEG, F&F8'c breaks the security of XS. 


3.3 The Emulator 


Our task is to emulate a family of circuits {Cp} with oracle access to Œ € G 
and Forgerg, using only oracle access to G. We assume without loss of generality 
that |C;| > k (otherwise we emulate a padded version of this family) and omit 
k from the input parameter list of the emulator. We also assume without loss of 
generality that before calling Forgerg on input (n,e, h, {0i}icj]), Ck first query 
G on {o;} (otherwise, we will emulate the circuit C}, that does so). 

Given a circuit C, Emul@(desc(C)) emulates the execution of a circuit 
C@Fereerc by forwarding the G-calls to G, and answering the Forgerg-calls us- 
ing the following method: let q = (n, e, h, {ai}ie[y) be a query that C makes to 
Forgerg, Emul first checks whether Forgerg returns bottom on this call (which 
it can do efficiently), and if positive returns bottom to C as well. Otherwise, 
Emul uses the query q and the description of C to factor n, and then uses this 
factorization to answer the query efficiently. 
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The interesting question is how can Emul use such a pair (C,q) to factor n ef- 
ficiently? Let H and M” = M@"-"({h(i)}ief]) as computed by Forgerg(q), and 
let MUO) = MOBO) (oibie) € Zexer, where the circuit (H;C) first eval- 
uates H and then CH Namely, M” represents the canonical form of {h(i)}ie(4 
induced by the (stand alone) computation of H, where MO) represents the 
canonical form of the “signatures” {o;}ie[4 induced by the computation of 
(H;C). Since (H;C) first starts by computing H, it follows that every hard- 
wired term u € HardWiredy,G,n N HardWired(7;c),G,n has the same index with 
respect to both ordered sets HardWiredy,¢,, and HardWired,7,¢),¢,n- Hence, 
the promise that of = h(i) [Gn] for every i € ft], yields the following with 
respect to {vi Jic] = HardWired(4,¢),G,n : 


Tle” = [he yt (eal 


jele] je] 


for every i € [t]. Since Gn is selected at random, (at least intuitively) C could 
have satisfied the above equations only if they hold regardless of the choice of 
Gn. Namely, it is the case that 


Y Më =e. X MEP mod ġ(n) (1) 


jel] Jel] 


for every i € [t]. On the other hand, the assumption that Forgerc(q) #1 yields 
that ranke M" = t. Therefore, Equation is “far” from being satisfied modulo 
e. In our proof we show how to use this inconsistency to find a multiple of ¢(n), 
and thus to factor n. 

The following description of Emul realizes the above discussion. We start by 
recalling the following known factoring algorithms. The first one is useful for 
small n’s (for which the above discussion does not hold), and the second one 
factors arbitrary larger n, given a multiple of d(n) as an advice. 


Theorem 6 (factoring small numbers, [11/34]). There exists a procedure 
Sef that on input n € N, runs in time 20\vesrlogtosn) and factors n with con- 
stant probability. 


Lemma 4 (factoring using multiple of ¢(n)). We say that z = (21, 22) € 
Z x N is a factoring advice for n € N, if zils ml -II 
multiple of ọ(n). 

There exists a procedure Factor that on input (n, 21,22), runs in time 
poly(z2) - poly(log|nz1|), and factors n with constant probability, assuming that 
z = (21,22) is a factoring advice for n. 


pEP: p<ze plies n] is a non-zero 


Proof. We use the following known algorithm due to Miller, [24]. 


Theorem 7 (Miller’s algorithm [24/34]). There exists a procedure that on 
input n E€ N and u E Z, runs in time poly(log|np|), and if w is a non-zero 
multiple of d(n), it factors n with constant probability. 


° Recall that we allow circuits to have a predetermined evaluating order. 
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By definition u = zl 6 nl Iper: Ez ples] is a non-zero multiple of ¢(n). Thus, 
Miller’s algorithm on input (n, 4), runs in time poly (log |n|) = poly(z2-log |nz1|) 
and factors n with constant probability. Finally, note that p is easily computable 
in time poly(z2, log n). 


We are now finally ready to define Emul. 


Algorithm 8 (Emul) 
Input: The description of an oracle-aided circuit C. 
Oracle: GEG. 
Operation: 
Emulate CC while on every query q = (n,e,h, {7i}iefe) to Forgerg, return the 
following value to C: 


1. If Forgerg would return L on q, return L as well (and continue to the next 
query). Else, 

2. Try to factor n by doing the following for |C| times: 
Ifn< |C] Bete ToT execute Sef(n). 
Otherwise, execute Factor(n, det(Qc,c,q), |C|*), where Qc,G.q is according to 
Definition|4. 

3. If factoring of n is successful, return h@(0)4 [Gn], where d = e~t 
mod ¢(n). 


Otherwise, abort. 


The matrix Qc,G,q is defined as follows: 


Definition 8 (query matrix). Let C be an oracle-aided circuit, G € G and 
let q = (n,e, h, {oi ties) be the query asked by CG Forsera to Forgerg. The matrix 
Qo,a.q E€ Ztxt is defined as follows: 


1. If Forgerg(q) =L, set Qc,G,q = Otxt- 
Otherwise: 

2. Let M” = Mo" ({h(i)} icp) according to Definition [}, where H is as in 
Algorithm Å with respect to this h and t. (Since Forgerg(q) AL, the matrix 
M” is well defined and of rank t.) 

3. Let T C |4] be the first subset of size t (from hereafter we assume some 
arbitrary order on such sets) with rank.(M#) = +E9 

4. Let MO) € Zixv be the matrix MEEO Uo hie) according to Defini- 
tion[$, where (H; C) is the circuit that first evaluates H and then evaluates 
C. 

5. Set Qoq = ME — e- MO, 


Note that in the code of Emul if Sef is called, and thus n is small, then it runs 
in time poly(|C|). In addition, the running time of Factor, if called, is also in 
poly(|C|). Thus, Emul runs in polynomial time. 


10 Remember that MË € Zıx: is the restriction of M™ to the columns in Z. 
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Moreover, it is clear that the only case where the output of Emul? (desc(C)) 
differs from the output of C is when the former aborts. This means that for 
some query of C to Forger, the latter would not return L, but either (1) Sef 
failed, or (2) z was a factoring advice but Factor failed, or (3) z was not a 
factoring advice for n. As the first two cases happen with negligible probability 
(by Theorem ld and Lemma W, we only have to prove that the latter happens 
with negligible probability. 

This is formally done in the following lemma, whose proof (done via the ” short 
description paradigm” ) can be found in the full version of this paper [14]. 


Lemma 5. A query q = (n,-) to Forger made by CC€9-Foreera is unexpected, if 


~ Forgerg(q) #1, 

— n> |C, and 

— AE is not a factoring advice for n, where Qc,G,q is according 
to Definition|g. 


The following holds for any oracle-aided circuit C: 
Praeg(CO Free asks Forger an unexpected query] < &(|C|), 


where 6(|C|) = Q- log |C], 


3.4 Putting It Together 


Proof (of Lemmalð). Lemma [3] yields that FCForsere breaks the security of XS 

with respect to every G € G, so it is left to prove that Emul (Cp) emulates 
G,Forgerg 

Ch well. 

Recall that |C,| € poly(k), and that we assume without loss of generality that 
|Ck| > k. Theorem [d] and Lemma f yield that Emul(C;,) answers all “expected” 
queries of Ck to Forger with probability 1 — |C;| - 272) = 1 — neg(k), where 
Lemmal5) yields that Ck asks unexpected queries with only negligible probability 
over the choice of G € G. Hence, with save but negligible probability, Emul° (C;) 

G,Forger 
emulates C% C correctly. 


Acknowledgments. We thank Nir Bitansky, Thomas Holenstein and Ilya 
Mironov for very helpful conversations. 


References 


1. Aggarwal, D., Maurer, U.: Breaking RSA Generically Is Equivalent to Factoring. 
In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 36-53. Springer, 
Heidelberg (2009) 

2. Bellare, M., Boldyreva, A., Palacio, A.: An Uninstantiable Random-Oracle-Model 
Scheme for a Hybrid-Encryption Problem. In: Cachin, C., Camenisch, J. (eds.) 
EUROCRYPT 2004. LNCS, vol. 3027, pp. 171-188. Springer, Heidelberg (2004) 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


On the Instantiability of Hash-and-Sign RSA Signatures 131 


. Bellare, M., Rogaway, P.: Random oracles are practical: A paradigm for design- 


ing efficient protocols. In: ACM Conference on Computer and Communications 
Security, pp. 62-73 (1993) 


. Bellare, M., Rogaway, P.: Optimal Asymmetric Encryption. In: De Santis, A. (ed.) 


EUROCRYPT 1994. LNCS, vol. 950, pp. 92-111. Springer, Heidelberg (1995) 


. Boldyreva, A., Fischlin, M.: Analysis of Random Oracle Instantiation Scenarios for 


OAEP and Other Practical Schemes. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, 
vol. 3621, pp. 412-429. Springer, Heidelberg (2005) 


. Brown, J., González Nieto, J.M., Boyd, C.: Efficient CCA-Secure Public-Key En- 


cryption Schemes from RSA-Related Assumptions. In: Barua, R., Lange, T. (eds.) 
INDOCRYPT 2006. LNCS, vol. 4329, pp. 176-190. Springer, Heidelberg (2006) 


. Canetti, R., Goldreich, O., Halevi, S.: On the Random-Oracle Methodology as 


Applied to Length-Restricted Signature Schemes. In: Naor, M. (ed.) TCC 2004. 
LNCS, vol. 2951, pp. 40-57. Springer, Heidelberg (2004) 


. Canetti, R., Goldreich, O., Halevi, S.: The random oracle methodology, revisited. 


JACM: Journal of the ACM, 51 (2004) 


. Coron, J.-S.: On the Exact Security of Full Domain Hash. In: Bellare, M. (ed.) 


CRYPTO 2000. LNCS, vol. 1880, pp. 229-235. Springer, Heidelberg (2000) 


. Cramer, R., Shoup, V.: Signature schemes based on the strong rsa assumption. 


ACM Trans. Inf. Syst. Secur. 3(3), 161-185 (2000) 


. Dixon, J.D.: Asymptotically fast factorization of integers. Mathematics of Compu- 


tation 36, 255-260 (1981) 


. Dodis, Y., Oliveira, R., Pietrzak, K.: On the Generic Insecurity of the Full Domain 


Hash. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 449-466. Springer, 
Heidelberg (2005) 

Dodis, Y., Reyzin, L.: On the Power of Claw-Free Permutations. In: Cimato, S., 
Galdi, C., Persiano, G. (eds.) SCN 2002. LNCS, vol. 2576, pp. 55-73. Springer, 
Heidelberg (2003) 

Dodis, Y., Haitner, I., Tentes, A.: On the instantiability of hash-and-sign rsa sig- 
Gennaro, R., Gertner, Y., Katz, J., Trevisan, L.: Bounds on the efficiency of generic 
cryptographic constructions. SIAM Journal on Computing 35(1), 217-246 (2005) 

Gennaro, R., Halevi, S., Rabin, T.: Secure Hash-and-Sign Signatures without 
the Random Oracle. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, 
pp. 123-139. Springer, Heidelberg (1999) 

Gennaro, R., Trevisan, L.: Lower bounds on the efficiency of generic cryptographic 
constructions. In: Proceedings of the 41st Annual Symposium on Foundations of 
Computer Science, pp. 305-313. IEEE Computer Society (2000) 

Goldwasser, S., Tauman-Kalai, Y.: On the (in)security of the fiat-shamir paradigm. 
In: Proceedings of the 44th Annual Symposium on Foundations of Computer Sci- 
ence (FOCS), pp. 102-113. IEEE Computer Society (2003) 

Haitner, I., Hoch, J.J., Reingold, O., Segev, G.: Finding collisions in interactive 
protocols — A tight lower bound on the round complexity of statistically-hiding 
commitments. In: Proceedings of the 48th Annual Symposium on Foundations of 
Computer Science (FOCS), pp. 669-679. IEEE Computer Society (2007) 

Haitner, I., Holenstein, T.: On the (Im)Possibility of Key Dependent Encryption. 
In: Reingold, O. (ed.) TCC 2009. LNCS, vol. 5444, pp. 202-219. Springer, Heidel- 
berg (2009) 

Hofheinz, D., Jager, T., Kiltz, E.: Short Signatures From Weaker Assumptions. 
In: Lee, D.H., Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073, pp. 647-666. 
Springer, Heidelberg (2011) 


132 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


Y. Dodis, I. Haitner, and A. Tentes 


Hohenberger, S., Waters, B.: Short and Stateless Signatures from the RSA Assump- 
tion. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 654-670. Springer, 
Heidelberg (2009) 

Maurer, U.M.: Abstract models of computation in cryptography. In: IMA Int. 
Conf., pp. 1-12 (2005) 

Miller, G.L.: Riemann’s hypothesis and tests for primality. Journal of Computer 
and System Sciences 13(3), 300-317 (1976) 

Nechaev, V.I.: Complexity of a determinate algorithm for the discrete logarithm. 
MATHNASUSSR: Mathematical Notes of the Academy of Sciences of the USSR, 
55 (1994) 

Nielsen, J.B.: Separating Random Oracle Proofs from Complexity Theoretic 
Proofs: The Non-committing Encryption Case. In: Yung, M. (ed.) CRYPTO 2002. 
LNCS, vol. 2442, pp. 111-126. Springer, Heidelberg (2002) 

Paillier, P.: Impossibility Proofs for RSA Signatures in the Standard Model. In: 
Abe, M. (ed.) CT-RSA 2007. LNCS, vol. 4377, pp. 31-48. Springer, Heidelberg 
(2006) 

Paillier, P., Vergnaud, D.: Discrete-Log-Based Signatures May Not Be Equiv- 
alent to Discrete Log. In: Roy, B. (ed.) ASIACRYPT 2005. LNCS, vol. 3788, 
pp. 1-20. Springer, Heidelberg (2005) 

Paillier, P., Villar, J.L.: Trading One-Wayness Against Chosen-Ciphertext Security 
in Factoring-Based Encryption. In: Lai, X., Chen, K. (eds.) ASIACRYPT 2006. 
LNCS, vol. 4284, pp. 252-266. Springer, Heidelberg (2006) 

Pietrzak, K.: Compression from Collisions, or Why CRHF Combiners Have a 
Long Output. In: Wagner, D. (ed.) CRYPTO 2008. LNCS, vol. 5157, pp. 413-432. 
Springer, Heidelberg (2008) 

Rivest, R.L., Shamir, A., Adelman, L.: A method for obtaining digital signatures 
and public-key cryptosystems. Communications of the ACM 21(2), 120-126 (1978) 
RSA Laboratories, Redwood City, California. PKCS #1: RSA Encryption Stan- 
dard (November 1993) 

Shoup, V.: Lower Bounds for Discrete Logarithms and Related Problems. In: Fumy, 
W. (ed.) EUROCRYPT 1997. LNCS, vol. 1233, pp. 256-266. Springer, Heidelberg 
(1997) 

Shoup, V.: Computational Introduction to Number Theory and Algebra. Cam- 
bridge University Press (2005) 

Wee, H.: One-Way Permutations, Interactive Hashing and Statistically Hiding 
Commitments. In: Vadhan, S.P. (ed.) TCC 2007. LNCS, vol. 4392, pp. 419-433. 
Springer, Heidelberg (2007) 


Beyond the Limitation of Prime-Order Bilinear 
Groups, and Round Optimal Blind Signatures 


1 and Jung Hee Cheon? 


Jae Hong Seo 

1 National Institute of Information and Communications Technology, Tokyo, Japan 
jaehong@nict.go.jp 

2 ISaC & Dep. of Mathematical Sciences, Seoul National University, Seoul, Korea 
jhcheon@snu.ac.kr 


Abstract. At Eurocrypt 2010, Freeman proposed a transformation from 
pairing-based schemes in composite-order bilinear groups to equivalent 
ones in prime-order bilinear groups. His transformation can be applied 
to pairing-based cryptosystems exploiting only one of two properties of 
composite-order bilinear groups: cancelling and projecting. At Asiacrypt 
2010, Meiklejohn, Shacham, and Freeman showed that prime-order bilin- 
ear groups according to Freeman’s construction cannot have two prop- 
erties simultaneously except negligible probability and, as an instance 
of implausible conversion, proposed a (partially) blind signature scheme 
whose security proof exploits both the cancelling and projecting proper- 
ties of composite-order bilinear groups. 

In this paper, we invalidate their evidence by presenting a security 
proof of the prime-order version of their blind signature scheme. Our 
security proof follows a different strategy and exploits only the projecting 
property. Instead of the cancelling property, a new property, that we call 
translating, on prime-order bilinear groups plays an important role in 
the security proof, whose existence was not known in composite-order 
bilinear groups. With this proof, we obtain a 2-move (i.e., round optimal) 
(partially) blind signature scheme (without random oracle) based on the 
decisional linear assumption in the common reference string model, which 
is of independent interest. 

As the second contribution of this paper, we construct prime-order 
bilinear groups that possess both the cancelling and projecting properties 
at the same time by considering more general base groups. That is, we 
take a rank n Z,-submodule of instead of Zp, to be a base group 
G, and consider the projections into its rank 1 submodules. We show 
that the subgroup decision assumption on this base group G holds in 
the generic bilinear group model for n = 2, and provide an efficient 
membership-checking algorithm to G, which was trivial in the previous 
setting. Consequently, it is still open whether there exists a cryptosystem 
on composite-order bilinear groups that cannot be constructed on prime- 
order bilinear groups. 


1 Introduction 


Since Boneh, Goh, and Nissim [10] introduced composite-order bilinear groups in 
2005, they have been used to solve many challenging problems in cryptography. 


R. Cramer (Ed.): TCC 2012, LNCS 7194, pp. 133-150, |2012. 
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Cryptographic systems using composite-order bilinear groups mostly utilize one 
of two properties, called cancelling and projecting, which Freeman [I7] identified. 
(Though Freeman named two properties recently, these properties were already 
used before.) The security of almost all crypto systems using composite-order 
bilinear groups is based on the subgroup decision assumption, introduced by 
Boneh, Goh, and Nissim [10], or its variants. 

Recently, some literature has aimed at constructing mathematical structures 
using prime-order bilinear groups with properties similar to (or richer than) 
composite-order bilinear groups [83/24J17[19]. In particular, Freeman [I7] pro- 
posed two product groups of prime-order bilinear groups with separately defined 
bilinear maps. He showed that two proposed product groups satisfy the sub- 
group decision assumption (in the sense that given g, it is infeasible to determine 
whether g is in a subgroup or the whole product group), and each product group 
with a bilinear map satisfies cancelling and projecting, respectively. One direct 
benefit of this approach is efficiency improvements of group operations and pair- 
ing computations. Loosely speaking, in bilinear groups of composite order, the 
group order N must be infeasible to factor so that group operations and pairing 
computations are less efficient than those of bilinear groups of prime order for 
the same security level. See for detailed efficiency comparison between 
composite-order groups and prime-order groups. 

On the other hand, Meiklejohn, Shacham, and Freeman [30] gave a negative re- 
sult, that is, an evidence of the limitation of constructing in some class of bilinear 
groups with both the cancelling and projecting properties, which is constructed 
on prime-order bilinear groups. To impart meaning to their result, they also pro- 
posed a round optimal blind signature scheme in composite-order bilinear groups 
whose security proof exploits both the cancelling and projecting properties of the 
composite-order bilinear group[]] Their round optimal blind signature scheme is 
of independent interest since it is the first practical scheme of this type based on 
static assumptions (not based on q-type assumptions) in the common reference 
string model. They left two open questions: (1) whether the instantiation in prime- 
order groups of their round optimal blind signature scheme is provably secure or 
insecure, and (2) whether their limitation result can be applied to a wider class of 
bilinear groups constructed from prime-order groups. 

In this paper, we answer both questions. We propose a (partially) blind sig- 
nature scheme in a prime-order bilinear group setting. The proposed scheme can 
be considered as an adapted version of the scheme in [30] to the prime-order 
group setting. However, we prove the one-more unforgeability of the proposed 
scheme by using a completely different strategy from [80]. Our proof does not 
require the cancelling property, and instead we use another property, that we 
call translating, on prime order groups. Informally, the translating property is 
that given g1,9g{ E€ Gi,g2 E€ Go, where Gi and Go are distinct subgroups of 
G, there exists a map 7 outputting g5. The translating property is used, in 


1 The scheme in itself does not use cancelling and projecting. Only the proof of 
security uses both cancelling and projecting properties. Thus, the authors do not rule 
out the existence of different proof strategy. 
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an essential way, to prove the one-more unforgeability of the proposed scheme. 
With this proof, we obtain a round optimal (partially) blind signature scheme 
(without relying on the random oracle heuristic) based on the decisional linear 
assumption in the common reference string model, which is of independent in- 
terest. Our blind signature scheme is more efficient than [30]. For example, our 
scheme has a shorter signature size (six elements in the prime-order group vs. 
two elements in the composite-order group). Moreover, the security of our blind 
signature scheme does not rely on the factoring assumption. (The blindness of 
the signature scheme in [30] based on the subgroup hiding assumption, which 
requires that the factorization of group order N is infeasible.) 

As the second contribution, we show that there exists a more general class of 
bilinear groups than Meiklejohn, Shacham, and Freeman considered, and some 
of theses can be both cancelling and projecting. That is, we take a rank n Zp- 
submodule of ae, instead of Z;, to be a base group G, and consider the projec- 
tions into its rank 1 submodules. In this case, we should carefully consider group 
membership tests of a subgroup. We provide an efficient membership-checking 
algorithm to G, which was trivial in the previous setting, and we show that the 
subgroup decision assumption on this base group G holds in the generic bilinear 
group model for n = 2. Consequently, it is still open as to whether there exists 
a cryptosystem on composite-order bilinear groups that cannot be constructed 
on prime-order bilinear groups. 

We note that although we construct a structure satisfying both cancelling and 
projecting, our construction can not be applied directly to the scheme in [30] 
to transform it to prime-order setting. The proof of [30] uses a property of 
composite-order group such that two subgroups’ order are relatively prime, and 
our construction does not support such property so that we could not apply our 
construction to the round optimal blind signature scheme in [80]. 


Related Work: Blind Signatures. Since Chaum [IiJ12] introduced 
the concept of blind signatures in 1982, it has been studied extensively 
because of its numerous applications, 
such as electronic voting and electronic cash [14]. Blind signatures are in- 
teractive protocols between a user and a signer. In blind signatures, informally, 
the user can obtain a signature (signed by the signer) on a message (chosen by 
the user) without revealing the message to the signer that is signed during the 
protocol; that is, the signer learns nothing about the message after finishing the 
protocol. 

In particular, round optimal (i.e., 2-move) blind signature schemes have re- 
ceived attention since the round complexity is an important measurement of 
efficiency in the computer network, and round optimal blind signature schemes 
directly imply that they are concurrently secure. In the random oracle 
model, there are elegant round optimal blind signatures by Chaum [12] and 
Boldyreva [8]. Without relying on the random oracle heuristic, there is an ap- 
proach using general NIZKs for NP, and its security depends on the assumption 
that a common reference string exists [I6[5]. Very recently, Garg et al. proposed 
the first round optimal blind signature in the standard model (without random 
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oracle and a setup assumption such as a common reference string) [20]. These 
approaches without random oracle, however, are not as efficient as an approach, 
in which we are interested, using a bilinear map [9J10}. 

In recent years several efficient round optimal blind signatures [18]4]2[30]3] have 
been proposed in the common reference string model, using a bilinear map, by 
combining signature schemes with efficient NIWI proofs [23]22[24]. These ap- 
proaches using a bilinear map either rely on q-type dynamic assumptions [I8]4)2J3) 
or working on the composite-order group . Though there is an analysis of a 
family of g-type dynamic assumptions by Cheon [15], the security of q-type as- 
sumptions still remains obscure. (q-type assumptions used in the above schemes 
hold in the generic group model [35] and these can be strong evidence for believing 
such assumptions. However, we believe that as the next step, constructing schemes 
without relying on such strong assumptions is an encouraging research approach.) 
In , around optimal blind signature scheme based on static assumptions (not 
on q-type assumptions) using composite-order groups is proposed. 


2 Notations and Definitions 


Throughout this paper, we use notation @ for the internal direct product: for 
an abelian group G, we write G = Gi © G2 when G and Go are subgroups of 
G and G1 N G2 = {1c} for the identity 1g of G. In this case, every element g in 
G can be uniquely written by g = g1 : g2 for some gı € Gi and g2 E€ Ge, where 
- is a group operation in G, and will be omitted sometimes. We use notation 


rÈ AIA is a group G, then it means that an element x is randomly chosen 
from G, and if A is an algorithm, then it means that A outputs x. fi, j] denotes 
a set of integers {7,--- , j}. We denote an abelian group generated by g1,- , gn 
by (91,°++ 4 9n)- 

We give formal definitions of bilinear group generators, and properties and 
cryptographic assumptions defined on the bilinear group. 


Definition 1. We say that G(-,-) is a bilinear group generator if it takes as 
input a security parameter A and a positive integer n > 1, and it outputs a tuple 
(G, Gi, H, Hi, Gi,e,o|7 € [1,n]) 2 G(A,n), where G, H, Gi are finite abelian 
groups, Gi and H; are cyclic subgroups of G and H of same order, respectively, 
such that G = Ọicj,n]Gi and H = @icp,nHi, and e : Gx H > Gi is a non- 
degenerate bilinear map, that is, it satisfies 


Bilinearity: e(g192, hıh2) = e(gi, hi)e(gi, h2)e(ga, hıjelgz, h2) 
for 91,92 E€ G and hi, he € H, 

Non-degeneracy: for g € G, if e(g,h) = 1 for any h € H, then g=1, 
for he H, if e(g,h) =1 for any g E€ G, then h=1, 


and o is additional information for group membership-check. Moreover, we as- 
sume that group operations, random samplings, and membership-checks in each 
group, and computation of e can be efficiently performed (i.e. polynomial-time 


in A). 
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We do not exclude the case that G = H. When G = H, we say that G is a 
symmetric bilinear group generator. 


Definition 2. We say that an algorithm Gı is a bilinear group generator of 
prime order if Gi (A) = G(A, 1), and Gı outputs groups G, G1, H, Hy, Gi of prime 
order p and a map e. Then, G = G,, H = Hı. We denote the three distinct 
groups G, H, G, by G,H, G+, respectively, and a bilinear map e by ê. 


Now, we provide definitions of two properties, called cancelling and projecting, 
which are introduced by Freeman [17]. 


Definition 3. A bilinear group generator G is cancelling if e(gi, hj) = 1: when- 
ever gi E Gi, hj € Hj, andi ¥ j, where 1, is the identity of Gy. 


Definition 4. A bilinear group generator G is projecting if there exist subgroups 
G’' CG, H' c H, and G, C Gi, and Ee a = E A t:G>G, 
T:H > H, and m; : Gi > Gi such that 


1. G’ C ker(r), H’ C ker(7), and Gi, C ker(m+). 
2. m(e(g,h)) =eln(g), 7(h)) for Yg E€ G and Yh E€ H. 


If G is a symmetric bilinear group generator, that is, G = H, then set G’ = H' 
and t =T. 


To prove the security of the proposed blind signature scheme, we need 
two widely-known assumptions, the Computational Diffie-Hellman assump- 
tion, and k-Linear assumption which is introduced by Hofheinz and Kiltz and 
Shacham [26]34], in the bilinear group setting. 


Definition 5. Let G; be a bilinear group generator of prime order. We define the 


advantage of an algorithm A in solving Computational Diffie-Hellman (CDH) 


problem in G, denoted by AWG S, is to be 


Pr EG H,Gi,e,9,9°,9°) > g” : (G, H, Gt,e) 2 Gi,g 2 G,a,b, È Zpl- 
We say that G satisfies Computational Diffie-Hellman (CDH) assumption in G 
if for any PPT algorithm A, Adige © is a negligible function of À. 


Definition 6. Let Gı be a bilinear group generator of prime order and k > 1. 
We define the advantage of an algorithm A in solving the k-Linear problem in 
G, denoted by Adu Gi", is to be 


Pr |AG, H, Gi, e, 9, ui, ust, g?,h forieé[l,k]) 31: 
(G,H,Gi,ec) È G1,g,u; È Gh È H, a; È Zp forie [1,k],b Ë Z,| 

— Pr EG H, Gi, €, g, Ui, us, g’,h forie [1,k]) > 1: 
(G,H,Gie) È G1, 9, u; È G,h È H, a; È Zp for i € [1k], b= Yc a] 4i! 


2 The non-triviality does not appear in the original definition [I7]. Without this, how- 
ever, every bilinear group can be projecting by using the trivial homomorphisms. 
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Then, we say that G satisfies the k-Linear assumption in G if for any PPT 
algorithm A, the advantage of A Adv gine is a negligible function of À. 
We can analogously define the CDH assumption and the k-Linear assumption in 
H. The 1-Linear assumption in G is the DDH assumption in G and the 2-Linear 
assumption in G is the decisional linear assumption in G. 

Next, we provide the definition of the subgroup decision assumption, adapted 
from [I7] to fit our purpose. 


Definition 7. Let G be a bilinear group generator. We define the advantage 
of an algorithm A in solving the (n,k)-subgroup decision problem on the left, 
denoted by Ady”: is to be 


Pr [AG @', H, H’,G1,e,0,g) +1: 
(G, Gi, H, H;, Gt, €, o) é GA, n), G := Bien] Gi, A! := Piet Hi, g fe c| 
-Pr EG G', H, H',Gi,e,0,g')> 1: 
(G, Gi, H, Hi, Gn e,0)È Gn), C= Siep Gi H' = Diep g È G'| 


We say that G satisfies the (n, k)-subgroup decision assumption on the left if for 
any PPT algorithm A, its advantage Ad is a negligible function in À. 


We analogously define the (n, k)-subgroup decision assumption on the right. 


Definition 8. We say that a bilinear group generator G(-,-) satisfies the (n, k)- 
subgroup decision assumption if G(-, n) satisfies both the (n, k)-subgroup decision 
assumptions on the left and on the right. 


We will often omit (n, k) term, if it is clear in the context. 


3 Round-Optimal Blind Signature in Prime-Order Group 


3.1 Symmetric Bilinear Group with Projecting Pairing 


We construct a symmetric bilinear group generator with the projecting property. 
(The symmetric bilinear groups mean that G = H, and G; = H; in our definition 
of bilinear groups.) We borrow some notations from Freeman’s paper [I7]. Let G 


be a group, g, 91,°*: , gn be elements in G, v= (a1,++* ,@n) be a vector in ZS, 
and M = (mij) be an n x n matrix. We denote g? := (g@,--- g7) € G” and 


(810 Gn) = hiep 8e o iep, 9) We can see that (uj = 
g@™), We newly define some notations useful to explain product groups. Let 
G = Diet njJGi and H = jen) Hj, where G; and H; are cyclic groups of same 
order. Let e(G;, Hj) be a set {e(gi, hj)|gi € Gi, hy € Hj}; hence e(G;, H;) is a 
cyclic group since G; and Hj are cyclic groups. In particular, when G; and Hj 
have prime order p, e(G;, H;) is a cyclic group of order p or 1. 

Now, we construct a symmetric bilinear group generator Ggp(A,3), which 
is a generalization of Groth and Sahai’s instantiation based on the decisional 
linear assumption [24], and is also a symmetric version of Freeman’s asymmetric 
bilinear group generator with the projecting property [I7]. 
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j= 


. GA) + (p, G, Ga, ê). 

2. Set G = GË, G; = G9. 

3. Choose linearly independent vectors 1, 22, 23 € Z3, and set Gy = (g7), 
Gy = (g®2) and G3 = (g”?). Then, G = G1 @ G2 6 G3. 

4. Define a map e : G x G > Gi by 


= e((g1, 92, 93), (bi, he, h3)) 


(èl, b1)1/?, êlg1, h2)", (gi, b3)'/?, (ga, b1)", êlg2, b2)™/?, êlg2, b3)'/?, 
@(g3,1)"/?, êlas, b2)™/?, êlas, b3)™/?) 
(êm, b1)'/?, €(g2, 61)'/?, €(g3, 61)'/?, €(g1, b2)", êlg2, b2)'/?, êlgs, b2), 
@(a1,63)", €(a2, 63)", €(gs, bs)'/?). 


Then, e(g” ,g7) = êlg, g)1/2 7 897)+1/20 87) where @ is a tensor product 
(Kronecker product) of two 3-dimensions vectors. 
5. For i € [1,3], define maps m; : G —> G and mii: Gi > Gi by 


M~'U;M (M~'U;M)@(M~!U;M) 
g It 


milg) = and 7,i(g2) = , respectively, 
where M is a 3 x 3 matrix having ne as its i-th row, U; is a 3 x 3 matrix 
with 1 in the (i,7) entry and zeroes elsewhere, and & is a tensor product 
of matrices: For ¢ x l2 matrix A = (a;,;) and l3 x l4 matrix B = (bij), 
AQ Bisa 3 x fg, matrix whose (i, 7)-th block is equal to a;,;B, where 
we consider A & B as 41 x 2 blocks. Then, 7; is a projection such that for 
gi € Gi, g2 E Ga, g3 E€ G3, Ti(g1g293) is equal to gj. 

6. Output (p, G, Gi, Go, G3, Gi, €, 71, 72,73, Tt,1; Tt, 2; 7,3): 


We provide a useful lemma to understand the structure of the image of e. 


Lemma 1. The image of e generated by Gsp is equal to 1<i<j<3e(Gi, Gj), 
and each e(G;,G;)’s order is p. 


We provide the proof of Lemma [1] in the full version of this paper. Non- 
degeneracy of e is directly coming from the lemma [I] (That is, e(g, h) Æ L 
for any non-identity elements g,h € G. If not, the image is not equal to 
@1<i<j<3e(Gi,G,;).) The bilinear property of e can be easily checked from the 
bilinear property of the tensor product. Further, Gsp satisfies the projecting 
property: Let G =G2.® G3, Gi = @o<i<j<3e(Gi, G;), T = T1, and me = Tt, 1; 
where G’, G}, m, and 7 are defined in the definition 4] Then, G’ C ker(m) and 
Gi, C ker(m,), and e, 7, m, satisfy the following commutative property. 
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We can check this commutative property as follows: 
e(g”,97)) 
a(e(g*,9")) 
4(é(g, g)1/2( (#@Y)+1/2( Ye2)) 

gy /2eF) +1/2(9 @a@)\(M~1U;M)@(M~1U;,M) 

g) 287) ((M~'U;M)@(M~1U;M))+1/2(¥ @ @)((M~1U;M)@(M~'U;M)) 
1/2( MTU: M)&( M~1U;M)+1/2(9 M-1U; M)@(@ M-1U;M) 
#M~'U;M) g(¥ MTT U:M)) 


l ao 7. ATIU: M 
(g? )M UiM (g? M UiM) 


= e(m(g”),m(g%)) = e(a(g*), (g7 )). 


The fifth equality comes from the property of the tensor product such as (A & 
B)(C® D) = (AC) ® (BD), where A and B are matrices having £ columns and 
C and D are matrices having £ rows for some £. (We can consider a vector as a 
matrix having one row.) 

In contrast to the composite order bilinear group, our product group of prime 
order group has an additional property, we name translating and define as follow. 


as 


Definition 9. A bilinear group generator G is (i, j)-translating if there exists 
efficiently computable (that is, ponmamial time in X) maps Ti j : G? x Gj > G; 
defined by (gi, g}, gj) > 95 and Ti j : H? x Hj + H; defined by (hi he h; Je 


t3 4) 


for an integer a € Z. If G is a symmetric bilinear group generator, then set 
Tij = Tij- 
We show that the above Gsp construction satisfies translating property. 


Theorem 1. Gsp(à,3) satisfies translating property for all i,j € [1,3]. 


Proof. We first construct 73,1. Given g§ and a 3 x 3 matrix M defined as in the 
description of Gsp, we can compute gf without knowing a as follows: 


AT} a =l a? j=l ae a 
(gM = (g7) M = (gM) M — gès = (1,1, 9%), 


(g%,1,1)™ = (g?21)M = ge”: = 9°, 


where ©; is the canonical i-th vector in Ze, for example, @ = (1,0,0). We can 
construct other 7;,; analogously. 


Moreover, Gsp satisfies (3,2)-subgroup decision assumption when the un- 
derlying group generator G, satisfies the decisional linear assumption. 


Lemma 2. If G, satisfies the decisional linear assumption, then Gsp satisfies 
the (3,2)-subgroup decision assumption. 


We relegate the proof of Lemma[2]in the full version of this paper. 


Remark 1. Note that Gsp does not satisfy the cancelling property since e(G;, G;) 
is not equal to {1;} for i # j (Lemma [}. 
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3.2 Construction 


The abstract of our scheme looks very similar to the Meiklejohn et al.’s con- 
struction in the composite order bilinear group [30]. We slightly changed the 
Meiklejohn et al.’s construction to adapt in the prime order bilinear group set- 
ting. 

(Partially) blind signature schemes in the common reference model consist of 
five (interactive) algorithms: Setup, KeyGen, User, Signer, and Verify. We provide 
the formal definition of (partially) blind signature schemes, and concurrently 
security, in the full version of this paper. We follow the security definition of [30], 
which is slightly stronger than [6], by allowing the adversary to choose the public 
key in the blindness definition. As a definition of the blind signature, [30] is 
modified from [27]; (1) it strengthens the blindness game to allow the adversary 
to generate the public key, and (2) it weakens the one-more unforgeability game 
to require that the messages (instead of pairs of message and signature) must 
all be distinct Ë] 

The proposed partially blind signature scheme for a message space M = 
{0,1} is as follows|4 


° Setup(A): Ggp(A, 3) 5 (p, G, G1, Go, G3, Gt, €, Ti, Tti). Choose g, u, u, EEEa 
um, VL +++ Um È G, hy È Gy and hy & Go. Define 


CRS = (p, G, Gi, e, g, u, u1, Um, V1, , Um, h1, ha). 


e KeyGen(CRS): Choose g’ È G. Set A= e(g, g'). The public key is PK = {A}, 
and the secret key is SK = {g'}. 

e User(C RS, PK, info, Msg): Let info be an mo bits string and Msg be an m— 
mo bit string. We write info bitwise as bo -bmo and Msg as by,41--: Om. 
For i € [mo + 1, m], pick random integers t;,1, ti,2, $4.1, 81,2, Ti, T} pa Zp, and 
compute 


— „biSi,1/,bi—lpSi,1 pSi,2\ti 1 pri — , bi 8i,2 7, bi—lpSi,1 psi, 2\ti2p—ri 
bia =U, (Ue h he?) ha, Oi =u? (vp hi ha? ehr, 
( 
au 


bi—1)8i,1 7, bs 7 $i,1 7 84,2) tg pri (01-1) 8i,2 7, bi 1 84,1 2 84,2) ty o p i 
i (v; hy hg”) the’, Oia = u (v; hy hgt) 2h] $. 


— 
Let Oi = (i1, , 93,4), and send req = de C heprtiri to the 
signer and save state = { (ti,1, ti,2)}ic[mo+1,m]: 
e Signer(C RS, SK, info, req): Write req = (eds, oes and info = 
bı +++ bmo. For each i € [mo +1, m], verify ci is a commitment of 0 or 1 by 
checking that 


T 


e(ci, div; ') = e(hi, 05,1 )e(ha, 6;,2) and e(ciu; t, di) 2 e(hi, 0i 3)elho, 0i 4). 


3 This weakened definition is necessary if the output signature can be re-randomized. 
[30]’s partially blind signature and ours are in the case. 
4 For large message spaces, we can use a collision resistance hash function first. 
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If for some i the above equation does not hold, abort the protocol and output 
L. Otherwise, compute 


/ bi 
e=(v Te) E 4). 


i€[1,mo] i€[mMo+1,m] 


; $ 
choose a random integer r <— Zp, compute 
' z 2 2 
Kı =g, K2 =g", K31 =h, K32= h3", 


send (K1, K2, K3,1, K3,2) to the user, and output success and info. 
e User(state, (Kı, Ko, K311, K3,2)): Write state = { iat) einedial: Check 
that ; z 
e(K3,1,g) = e(Ka, hy) and e(K3,2, 9) = e(Kə, h2). 
If one of two above equations is fail to hold, then abort the protocol and 
output L. Otherwise, unblind the signature by computing 


Si = Kı 7 ( II Rk) and So = Ko. 


i€[mo+1,m] 


Check the validity of the signature (S1, S2) by running Verify. If it outputs 
accept, then go to the next step. Otherwise, abort the protocol and output 
L. Finally re-randomize the signature by picking a random s È Zp and 
computing 
Si =S (u [[ u”) and $5 = S2- g~. 
i€[1,m] 

Output the signature sig = (S1, 92), info, and success. 

e Verify(CRS, PK,info, Msg, sig): Write PK = {A}, info = b1---bm, 
Msg = bmo +++ bm, and sig = (S1, S2). Check that 


e(Si1,9)-e(S2,u" J] uf) 5A. 


i€[1,m] 
If the above equality holds, then output accept. Otherwise, output fail. 


In the first procedure of the user, c; and d; are GS-commitment to b;, and re 
is GS-proof that b; satisfies the equation b;(b; — 1) = 0 so that b; = 0 or b; = 1. 
More precisely, when b; and b; are openings of c; and d;, respectively, re is a 
proof that 6;(b; — 1) = 0 and (b; — 1)b; = 0. Then, (b; = 0 or b; = 1) A (b; = 1 or 
bi = 0) so that b; = b; = 0 or b; = b, = 1. We provide three theorems to prove 
the security of the proposed (partially) blind signature scheme. 


Theorem 2. The above blind signature is correct. 


Theorem 3. If Gı satisfies the decisional linear assumption, then the above 
blind signature satisfies blindness. 
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The proof of Theorem[2] andBJare similar to the previous ones [30]. We provide 
the proof in the full version of this paper. 


Theorem 4. If Gı satisfies the the CDH assumption, then the above blind sig- 
nature is one-more unforgeable. 


Due to space constraints, we leave the proof of Theorem [4]to the full version of 
this paper. Instead, we briefly explain our idea to prove the one-more unforge- 
ability, and the reason why we cannot apply the Meiklejohn et al. proof strategy 
to the proposed scheme. At the end of the interaction, the user obtains a Waters- 
signature, which is existentially unforgeable based on the CDH assumption. If 
the user obtains only a Waters signature, then the proposed scheme is, loosely 
speaking, also one-more unforgeable. However, the user obtains not only a Wa- 
ters signature (of the form g'(u J [iep m] u?’ )” and g7” for message bı -- - bm), but 
also some additional information, that is, it eventually gets 


gu TE èe [| Khey, got, hī", and hz” 
i€[1,m] i€[mo+1,m] 

for some (unknown and uniformly distributed) r € Zp, and t; 1, ti,2, and b; cho- 
sen by itself. Therefore, we should show that hy", hy", and ([[ieping mm ht hg?) 
will not be helpful for the user to break the one-more unforgeability. In [30], a 
pairing e satisfies the cancelling property, and orders of subgroups are relatively 
prime so that each part contained in each subgroup in a signature scheme is 
independent. [30] essentially utilized this independence. If, in our scheme, the 
G ®© G2 part and G3 part were independent, the user could not obtain any ad- 
ditional information about the part in G3 from the above information. (Since all 
information other than a Waters signature, which the user gets at the end of the 
protocol, is related to hı and h2, which are elements in G1 6 G2, this information 
will not be helpful for forging the Waters signature in the G3 part.) Hence, the 
one-more unforgeability of the scheme can be reduced to the existential unforge- 
ability of the Waters signature (in G3 in the case of our scheme). However, we 
cannot apply this Meiklejohn et al. proof strategy to our scheme since our bilin- 
ear map e does not have the cancelling property and each subgroup has the same 
order p. Instead, we prove the one-more unforgeability using a completely differ- 
ent strategy. Our simulation basically follows the simulation for the existential 
unforgeability of the Waters signature, and at the same time simulates directly 
additional information hy", hy”, and ([[iepmng+i,m] hit hse?)". It seems hard to 
simulate ([[i<tmo-+1,m] hye hs?) since ti ı and t;,2 are chosen by the user and r 
is usually not known to the simulator during the simulation. (r is usually of the 
form Ra + S for some unknown a and constants R and S, where a is given by 
the form g*.) We circumvent this obstacle by using the projecting property and 
the translating property mentioned in section B.I] To simulate this additional 
information, the simulator first extracts the message, that is, recovers b1 -+ -bm 
by computing log,,(u,)71(ci) = bi, and second computes 77; (¢;/u?') = h” and 


e, J TlO) = malui) ep 4 J Ta(0is) = mahv) 
if b; = o,f A E ie a 
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Though 73(v;)‘*7 is contained in G3, we can change it to be of the form 
he for some unknown a by using the translating property mentioned in 
section when v; contains a in the exponent. The simulator can generate 
(TLicfino-+a.mj PE hZ? )" by using Aj? and hg. 


Remark 2. The decisional linear assumption implies the CDH assumption. (The 
decisional linear assumption implies the computational linear assumption, and 
the computational linear assumption implies the CDH assumption. Reductions 
are quite straightforward.) 


Remark 3. In the user’s first procedure, the GS-commitment and proof appear 
to have redundant parts. It would be more natural to change them to 


— b; p ti,1 1 ti,2 = 2bj—-17, tir, ti,2)ti 1 pri — 2bi— lz ti, 1 pti 2\ti2p- ri 
ai= (wi) hi ha”, Oia = (u hi ha”) ha, Oi = (up hi hg) hg, 


and it can be verified by e(ci, cu; ') 2 e(hı, 0i 1)e(h2, 0:2). This commitment 
and proof is GS commitment and proof for b; € {0,1}. However, we note that 
in this case, we could not prove the one-more unforgeability based on the CDH 
assumption. We only proved the one-more unforgeability based on the decisional 
linear assumption and augmented CDH assumption. (Augmented CDH assump- 
tion roughly says that given g, 9%, 9°, g, it is infeasible to compute g”.) To 
avoid requiring g”, in the simulation, that is, to prove the one-more unforge- 
ability based on the CDH assumption, we modified the commitment and the 
proof to the current form. 


4 Bilinear Group: Both Cancelling and Projecting 


4.1 Interpreting Limitation Result in 


In [30], the authors consider the cases that the bilinear group generator G (A, n) 
is defined as follows: 


1. (p,G, H, Ge, ê) Ë G1(A) 
2. G = G”, H = G”, and G; = GP for some positive integer m. 
3. a bilinear map e : G x G > G; is defined by 


e((g1; sgn), (Nis 5n)) = ‘oo e((g1; On), (Das; sba))®, +) 
=(- Ti jetany êlgi bj) ,---), 


where a € Zp for all i, j € [1,n] and £ € [1, m]. 


The authors showed that e can be both the cancelling and projecting only with 
negligible probability when e is defined as the above. In the above G construction, 
to generate a rank n Zp-module, G is defined as G”. In the proof for the limitation 
result ([30] Proposition 6.4 and Theorem 6.5]), the authors used, in an essential 
way, the fact that a rank n Z,-module is of the form G”. 
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We can, however, also define, in a different way, a rank n Z,-module G. 
First generate a rank n'(> n) Z,-module G, and then define G as a rank n 
Z»-submodule of G. For example, define G = Gt and 


Ca a eo oa g, g°, g®)), 


where { (a;i, bi, ci, di) }ie[1,3] is a set of linearly independent vectors in Zi. Then, G 
is a rank 3 Zp-submodule of a rank 4 Zp-module G. This example is not included 
in the case of the above G construction. In this example, we should argue about 
the membership check of G since any group should be easy to check for its 
membership to be used for cryptographic applications. If there is no additional 
information, the membership check of G is infeasible since it is equivalent to 
the decisional 3-linear problem. However, we should not rule out this case when 
some additional information for membership check is given. Our construction is 
exactly such a case. 


4.2 Our Construction 


First, we give an instructive intuition of our construction. To construct a bilinear 
group generator with projecting, we should consider the order of image of a 
bilinear map, which should be larger than prime pl We start from a bilinear 
group generator with the cancelling property [I7]. We consider n different bilinear 
group generators (of rank n) with cancelling property. Let G® = Dye [tn] Gij 
(rank n Z,-module), H® = @jep1njHij (rank n Zp-module) and é; (bilinear 
map) be the output of i-th bilinear group generator. Let Gj; = (gij) that is a 
rank 1 Z,-submodule of a rank n Z,-module. Let Gj be ((g1;,-++ , 9nj)), which 
is a rank 1 Z,-submodule of a rank n? Z,-module (n direct product of n Zp- 
modules). Define H; similarly, and define G = @jet1njGj and H = Oyen nj Hj. 
We define a map e by using bilinear maps é; defined over each G x H® as 
follows: 


e((g1; =: 9n), (his shin) = (€1(g1,/1), °° ,En(gn, hn)), 


where g; € G® and h; € H®. This construction also satisfies the cancelling 
property. If we can control the basis of the image of e so that the order of image 
is not prime p, then we may obtain the projecting property. 

For vectors I = (Q1, ,@n) = (a11; , Qnn) and A = TIER ,Bn) = 


2 . 
(B11,--* , Bnn) € ZF , and a group element g € G, we define a notation I o A := 
(Qi Bi. On: Ba) € Zp, where a;’s and B;’s are vectors in Zp, and 
g; bj= J ecj,n] &jebje. Now, we describe our construction Gop. 


1. Take a security parameter and a positive integer n as inputs, run 1, and 
obtain (p, G, H, G, ê). 
2. Choose generators g and h at random from G and H, respectively. 


5 If the image of a bilinear map is prime p, it cannot satisfy projecting property Bo. 
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3. Choose X1,--- , Xn and D from GL,,(Z,) at random. Define D; € Matn (Zp) 
be a diagonal matrix having D’s i-th column vector as its diagonal. Define 
Yı by Di(X;")'. 

4. Let thes be the i-th row of 2 
D, = (Yain Bin) and $; = (@ 
subgroup in G"’ generated by (g” 
generated by (h®:). 

5. Define G and H by the internal direct product of G;’s and H;’s, respectively. 
That is, G = ®ietinjGi C G”, and H = jen) Hi C H”. Define G; by 
G7. 

6. Define a map e : G x H — G as follows: 


e(g’, 6) = ( II ag? pô), Da II eg", 6P"*)) = é(g, by, 


lE[1,n] LE[1,n] 


and by be the i-th row of Y;. Let 
ilstt ts Ba Then, define G; by a cyclic 
), and define H; by a cyclic group in H” 


for any I’ = (a11, Qnn) and A= (611,°°° , Bnn). 

7. Take a basis of (%,--- ,Y,)+ at random, say {th, vee Yuen}, and take a 
basis of (4, --- ,®,)+ at random, say {@,,--- ,@,2_,,}, where the notation 
(I1,-++,In)+ means a set of all orthogonal vectors to (I\,--- , In}. Define 


= (é, ee A Henn}, ar ile ,g?n?-n}), 
8. Output (G, Gi,- , Gn, H, Hi,- , Hn, Gt, e, 0). 


In the description of Gop each G; and H; is defined to be rank 1, as Zp- 
submodules of G”°, and for i Æ j, Gi N Gj = H; N Hy = {1gn2}, where 1g,2 is 
the identity of G”, Therefore, in the step 5, G = OjepijnjGi and H = Dien) 
are well-defined and rank n Zp-submodules of G’, 


4.3 Cancelling, Projecting, and Translating 


It is straightforward to check that e is a non-degenerate bilinear map. We show 
that e satisfies cancelling, projecting and translating. 


Theorem 5. Let (G = icj,n]Gi, Gi, H = Diep nj] Hi, Hi, Gt, e, 0) be the output 
of the above Gcp. Then, e is both cancelling and projecting. 


Proof. Let X1,- , Xn, Y1, ++- , Yn and D be generated in the step 3 of Section 
[4.2] These satisfy the following three conditions. 


(1) X; and Y; are in GL,,(Z,) for £ € [1,n]. 
(2) For £ € [1,n] each X; - Y,!' is a diagonal matrix with a diagonal dy. 
(3) D = (d,---d,), that is, the i-th column vector of D is dj. 


From the condition (1) we can see that W;’s are linearly independent and @;’s are 
linearly independent and so G = @je(1n)Gi and H = je (1 n) Hi are well-defined. 
The condition (2) guarantees that e is a cancelling bilinear map: For i Æ J, 
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H o 8; := (Pa - Pjr, Yin: Bin) = 0 and so efg”, h”) = e(g, b)? = 
(Le, ,1g,) is equal to the identity of the product group (G+)”. The third 
condition (3) implies that {Vo ®;}ie1,n] is a set of linearly independent vectors 
in Z?; hence, any pair of groups e(Gj, Hi) = (e(g,b)”°") = ((g, 6) 4) 
has no common element except the identity so that Im(e) = iep nje(Gi, Hi) = 
G,. We can consider natural projections m; : G — Gi, Ti : H — H;i, and 
Tii : Gi > e(G;, Hi). We can construct these projections, in a similar way as 
the construction of the projections in the subsection [3.1] We leave the details to 
the full version of this paper. Let G’ = ®j2,n)Gi, H' = Op n] Hj, G; = e(G’, H’), 
T = Ti, T = Ti, and Ti = Ti. Then, e satisfies the definition J] 


Theorem 6. Gop(A,n) satisfies translating property for all i, 7 € [1, nl]. 


Proof. We will construct 73,1. We can construct other Ti; and 7;,; similarly. 
Given g3, g5 and n x n matrices X; defined as in the description of Gop, we can 
compute gi without knowing a as follows: 


> > 
Parse g% as (g”)* = ((g¥2")*,--- ,(g”3")*), and compute 


. as a Ps a ; ae ag: a 
for j € [Ln], (gP) T = (g? = gs = (1,1, 9%,--- ,1), 


where Z; is the canonical i-th vector in Zp, for example, = (1,0,0,---,0). 


We show that anyone knowing o can test membership of elements in G and H 
(membership test for G; is trivial) in the full version. Finally, we should show 
that G satisfies the subgroup decision assumption, but it is not easy to prove that 
G satisfies the subgroup decision for any n. Instead, in the full version we give a 
proof that, for n = 2, G satisfies the (2, 1)-subgroup decision assumption in the 
generic bilinear group model (that is, we assume that the adversary should 
access the oracles for group operations of G, H, G; and pairing computations 
for ê, where G1 > (p, G, H, G,, é)). Though we give a proof for the case n = 2, 
we are positive that Gop satisfies the subgroup decision assumption for n > 2. 
For n > 2, there are several variables, particularly in ø, we should consider for 
the subgroup decision assumption, so these make it hard to prove for the case 
n > 2, even in the generic bilinear group model [f] 


5 Conclusions and Further Work 


In this paper, we answered two open questions left by Meiklejohn, Shacham, and 
Freeman. First, we showed that the security of the Meiklejohn et al.’s (partial) 


6 All variables in ø is public, so to show that Gop satisfies the subgroup decision 
assumption, the simulator should simulate o in the proof. 
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blind signature can be proved in the prime-order bilinear group setting [| Second, 
we showed that there exist bilinear group generators that are both cancelling and 
projecting in the prime-order bilinear group setting. 

The proof of the Meiklejohn-Shacham-Freeman blind signature scheme, and 
the Lewko-Waters identity-based encryption scheme [29] essentially use the fact 
that orders of subgroups are relatively prime as well as the projecting and/or 
cancelling properties. For each scheme, the adapted version in prime-order bi- 
linear groups is proposed, with a different security proof strategy, in this paper 
and [29], respectively. It would be interesting to find a general procedure to 
transform such schemes using relatively prime orders in composite-order groups 
to schemes in prime-order groups. 

We proposed a new mathematical framework with both cancelling and pro- 
jecting in a prime-order bilinear group setting, and gave the proof that the (2, 1) 
subgroup decision assumption holds in the generic bilinear group model when 
n = 2. This research leaves many interesting open problems. We ask if the 
subgroup decision assumption holds when n > 2, and if the subgroup decision 
assumption can be reduced to the simple assumption such as the (decisional) 
k-linear assumption. We did not find good cryptographic applications of this 
framework. It would be interesting to design cryptographic schemes based on the 
proposed framework. We expect that this research will provide other directions 
for our primitive question: whether there exists a cryptosystem on composite- 
order bilinear groups that cannot be constructed on prime-order bilinear groups. 
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Abstract. We revisit the question of Zero-Knowledge PCPs, studied by 
Kilian, Petrank, and Tardos (STOC ’97). A ZK-PCP is defined similarly 
to a standard PCP, except that the view of any (possibly malicious) 
verifier can be efficiently simulated up to a small statistical distance. 
Kilian et al. obtained a ZK-PCP for NEXP in which the proof oracle is 
in EXPN? They also obtained a ZK-PCP for NP in which the proof 
oracle is computable in polynomial-time, but this ZK-PCP is only zero- 
knowledge against bounded-query verifiers who make at most an a priori 
fixed polynomial number of queries. The existence of ZK-PCPs for NP 
with efficient oracles and arbitrary polynomial-time malicious verifiers 
was left open. This question is motivated by the recent line of work on 
cryptography using tamper-proof hardware tokens: an efficient ZK-PCP 
(for any language) is equivalent to a statistical zero-knowledge proof 
using only a single stateless token sent to the verifier. 
We obtain the following results regarding efficient ZK-PCPs: 


Negative Result on Efficient ZK-PCPs. Assuming that the poly- 
nomial time hierarchy does not collapse, we settle the above question 
in the negative for ZK-PCPs in which the verifier is nonadaptive (i.e. 
the queries only depend on the input and secret randomness but not 
on the PCP answers). 

Simplifying Bounded-Query ZK-PCPs. The bounded-query zero- 
knowledge PCP of Kilian et al. starts from a weakly-sound bounded- 
query ZK-PCP of Dwork et al. (CRYPTO ’92) and amplifies its 
soundness by introducing and constructing a new primitive called 
locking scheme — an unconditional oracle-based analogue of a com- 
mitment scheme. We simplify the ZK-PCP of Kilian et al. by present- 
ing an elementary new construction of locking schemes. Our locking 
scheme is purely combinatorial. 

Black-Box Sublinear ZK Arguments via ZK-PCPs. Kilian used 
PCPs to construct sublinear-communication zero-knowledge argu- 
ments for NP which make a non-black-box use of collision-resistant 
hash functions (STOC ’92). We show that ZK-PCPs can be used to 
get black-box variants of this result with improved round complexity, 
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as well as an unconditional zero-knowledge variant of Micali’s non- 
interactive CS Proofs (FOCS ’94) in the Random Oracle Model. 


Keywords: Zero-Knowledge, Probabilistically Checkable Proofs, 
Arthur Merlin Games, Tamper-Proof Tokens, Sublinear Arguments. 


1 Introduction 


The seminal work of Goldwasser, Micali, and Rackoff [30] changed the classical 
notion of a mathematical proof by incorporating randomness and interaction. 
This change was initially motivated by the intriguing possibility of zero knowl- 
edge proofs — proofs that carry no extra knowledge other than being convinc- 
ing. The result of Goldreich, Micali, and Wigderson showed that any NP 
statement can be proved in a zero-knowledge (ZK) manner, making ZK proofs a 
central tool for cryptographic protocol design; this was later extended by Ben-Or 
et al. [8] to any language in PSPACE. All these fundamental results, however, 
relied on the assumption that one-way functions exist. Ostrovsky and Wigder- 
son [46] showed that (similar) computational assumptions are indeed inherent 
for non-trivial zero-knowledge. 

Motivated by the goal of achieving unconditionally secure zero-knowledge 
proofs for NP, Ben-Or, Goldwasser, Kilian and Wigderson [9] introduced the 
model of multi-prover interactive proofs (MIP) and presented a perfect ZK pro- 
tocol for any statement that is provable in the MIP model. Shortly after, Babai, 
Fortnow, and Lund [6] showed that in fact any language in NEXP can be proved 
in the MIP model. Fortnow, Rompel, and Sipser [23] studied the MIP model 
further and observed that as a proof system it is equivalent to another model 
in which an oracle encodes a probabilistically checkable proof (PCP) which is 
queried by an efficient randomized verifier. (The PCP oracle is often identified 
with the proof string defined by its truth-table, in which case the output domain 
of the oracle is referred to as the PCP alphabet.) The difference between a prover 
and a PCP oracle is that a prover can keep an internal state, and hence its answer 
to a given question can depend on other questions. Therefore, soundness against 
a PCP oracle is potentially easier to achieve than soundness against a malicious 
prover. This line of work culminated in the celebrated PCP theorem [43]. 


Zero-Knowledge PCPs. In this work we study zero-knowledge proofs in the PCP 
model. A zero-knowledge PCP (ZK-PCP) is defined similarly to a standard 
PCP, except that the view of any (possibly malicious) verifier can be efficiently 
simulated up to a small statistical distance. It is instructive to note that zero- 
knowledge PCPs are incomparable to traditional ZK proofs: since the PCP model 
makes the prover less powerful, achieving soundness may become easier whereas 
achieving zero-knowledge may become harder. 

The original ZK protocol of for NP implicitly relies on honest-verifier zero- 
knowledge PCP for the NP-complete problem of 3-coloring of graphs. In this PCP 
the prover takes any 3-coloring of the input graph, randomly permutes the 3 colors, 
and writes down the colors as the PCP string. The verifier chooses a random edge, 
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reads the colors of the vertices of that edge, and accepts iff the colors are different. 
This ZK-PCP has two disadvantages: (1) it is only zero-knowledge against hon- 
est verifiers (a malicious verifier can learn whether the colors of two non-adjacent 
nodes are identical), and (2) the soundness error is very large: 1 — 1/m where 
m is the number of edges. Dwork et al. ma relying on the PCP theorem [BM], 
improved the ZK-PCP implicit in [27] in both directions. Their construction im- 
plies a ZK-PCP for NP of polynomial length and with a constant alphabet size 
such that: (1) the PCP is zero-knowledge against verifiers who ask any pair of 
queries (but not more), and (2) the soundness error is constant. However, the 
soundness error of this ZK-PCP could not be easily reduced further while main- 
taining ZK against malicious verifiers. Furthermore, it could not be made zero- 
knowledge against arbitrary polynomial-time verifiers, simply because it has poly- 
nomial length and a malicious verifier could read the entire proof string. 

Kilian, Petrank, and Tardos were the first to explicitly study the power of 
ZK-PCPs with malicious verifiers. Their work shows how to get around the above 
limitations, resulting in two kinds of ZK-PCPs with security against malicious ver- 
ifiers. For the case of languages in NP, [40] obtain a PCP of polynomial length over 
a binary alphabet which is zero-knowledge with negligible soundness error against 
malicious verifiers who are limited to ask only up to any fixed polynomial p(|z|) 
number of queries, whereas the honest verifier only asks polylog(|z|) queries to ver- 
ify the PCP. (The length of the PCP string can be polynomially larger than p(|z|).) 
We call such PCPs bounded-query ZK. For the case of languages in NEXP, a scaled 
up version of this construction yields a ZK-PCP in which honest verifiers are effi- 
cient (i.e. run in poly(|z|) time), but soundness holds against arbitrary polynomial 
time verifiers. However, the PCP oracle in this case cannot be computed in poly- 
nomial time even for languages in NP. (By “computable in polynomial time” we 
mean that the oracle outputs a polynomial-time computable function of its secret 
randomness, the input z, the NP-witness, and the verifier’s query.) This is inher- 
ent to the approach of [40], as it requires the entropy of the PCP oracle to be bigger 
than the number of queries made by a malicious verifier. 

The above state of affairs leaves open the following natural question. 


Main Question: Are there efficiently computable PCPs for NP which 
are statistically zero-knowledge against any polynomial-time verifier? 


An additional motivation to study the question above comes from the recent 
line of work on cryptography in an extended model of interaction with “tamper- 
proof hardware tokens” |88/44]14]29]34]41)33]. This model allows the parties to 
generate and exchange tamper-proof hardware tokens which are simply circuits 
(with or without internal state) that are accessible only as a black-box. Indeed, 
an efficient ZK-PCP for NP is equivalent to a statistical zero-knowledge proof for 
NP in this model where the only message sent to the verifier is a single stateless 
token. The stateless nature of the PCP oracle (inside the token) would make 
such a protocol secure against “resetting attacks” [13]. With this motivation in 
mind, we revisit the feasibility question of efficient ZK-PCPs for NP. 


1 This formulation of the result of is due to [40]. 
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2 Our Results 


Our main theorem provides a negative answer to the main question above for 
the case of nonadaptive (honest) verifiers whose queries can only depend on 
their randomness and the input x but not on the prover’s answers (so all the 
queries can be prepared and asked in one round). This theorem may be viewed 
as supporting the conjecture that efficient ZK-PCPs for NP do not exist. 

In the setting of bounded-query ZK-PCPs, we revisit the construction of [40] 
and simplify it considerably. Our contribution is to present a simple combina- 
torial construction of a “locking schemes” which was the main tool developed 
in [40] and used in both of their constructions for NP and NEXP. 

Finally, motivated by a line of work on the power of black-box constructions 
in cryptography, we show that efficient bounded-query ZK-PCPs can be used 
to make the sublinear-communication zero-knowledge argument construction of 
Kilian black-box. Kilian’s construction assumes the existence of a collision- 
resistant hash function, but it uses the hash function in a non-black-box way. 
We also obtain constant-round variants of this result and an unconditional non- 
interactive variant in the Random Oracle Model. In the following we describe 
our results more formally and put them in the proper context 


2.1 Efficient Nonadaptive ZK-PCPs 
We prove the following negative result about the existence of ZK-PCPs for NP. 


Theorem 1 (Main Theorem). If there exists an efficiently computable PCP 
for NP with a nonadaptive honest verifier, constant soundness error, and zero- 
knowledge against arbitrary polynomial-time verifiers, then the polynomial-time 
hierarchy collapses. 


What we prove is actually more general than the statement of Theorem 
Namely, we show that any language with an efficient ZK-PCP of polynomial 
Shannon entropy (see Remark [4) and a nonadaptive verifier is in coAM, and 
Theorem[I] follows by the result of [12]. Also, we only require the zero-knowledge 
to hold also against nonadaptive verifiers (of arbitrary polynomial time) P] 

We emphasize that even though the zero-knowledge property of ZK-PCPs is 
defined in a statistical fashion, our main theorem above does not follow from the 
classical result of Fortnow, Aiello, and Hastad [122] who proved that SZK C 
AM N coAM. The reason is that although achieving zero-knowledge in the 
PCP model is harder, achieving soundness in this model is potentially easier] 
Therefore the languages which posses efficient ZK-PCPs (as far as we know) are 
not necessarily included in SZK. Also recall that if one does not require the 


2 The requirement that the honest verifier be nonadaptive is a restriction to our The- 
orem [I] but only requiring the zero-knowledge to hold against nonadaptive verifiers 
makes our result stronger. 

3 The latter comparison manifests itself in the following characterizations: it holds 
that PCP(poly, poly) = MIP = NEXP while IP = PSPACE C EXP. 
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PCP oracle to be efficiently computable, by the result of [40] all of the languages 
in NEXP (including NP) do have (statistical) ZK-PCPs. 

Using Theorem [I itself, we can extend Theorem [I] to the case of adaptive 
(honest) verifiers, as long as the total length of the prover’s answers returned in 
an honest PCP verification is O(log n) bits (see Corollary [7). 


Ideas and Tools. At a high level the proof of Theorem [I] uses ideas from many 
previous influential works [260201] and tools from old and new results in the 
context of constant-round proofs [31[28]36]. The main challenges are in how to 
force an untrusted prover to extract a PCP oracle from the simulator and run 
the honest verifier against this PCP. The soundness of this protocol follows from 
the soundness of the original PCP. To get the completeness, we need to extract 
this PCP in a way that it is “close” to an actual accepting PCP, and this is where 
we use efficiency of the PCP and its bounded entropy. Section B]is dedicated to 
describing the main result formally and the main ideas behind it. See the full 
version of the paper for a formal description of our AM protocol. 


Motivation and Related Work. A recent line of work in cryptogra- 
phy [88)44]14]29]34]47]33] studies the possibility of obtaining secure protocols 
in an extended model of interaction in which the parties are allowed to ex- 
change more than just classical bits: the parties are allowed to locally construct 
a (stateful or stateless) circuit, put it inside a tamper-proof token, and send it to 
another party. The receiver of a token (in this model) is allowed only to use it as 
a black-box. Namely, she is only allowed to give inputs to the token and receive 
the output. (If the token is stateful, asking the same query twice might lead to 
different answers.) Designing protocols in this model is made challenging by the 
fact that a receiver of a token has no guarantee that the token is indeed well 
formed. The work of Goyal et al. [84] showed that any two-party functionality 
(e.g. zero-knowledge proof) can be carried out securely in this model without 
relying on computational assumptions. Unfortunately the solution of [34] uses 
stateful tokens, which makes it vulnerable to “resetting attacks”. Namely, there 
is no security guarantee if a malicious party receiving a token can reset it to its 
initial state, say, by cutting off its power. 

In another line of research, Kalai and Raz [87] introduced the Interactive PCP 
(IPCP) model which is a hybrid between the two-prover and the PCP models. 
In the IPCP model the verifier interacts with a prover and a PCP oracle. Note 
that when the prover and the PCP oracle are efficiently computable, the IPCP 
model becomes a special case of the tamper-proof token model in which the 
prover sends a stateless token (computing the PCP) to the verifier. 

Although Kalai and Raz [37] introduced the IPCP model for the purpose of 
optimizing the PCP length at the cost of small amount of interaction with the 
prover, Goyal, Ishai, Mahmoody, and Sahai [33] showed that the IPCP model is 
also interesting for cryptographic purposes in the context of achieving uncondi- 
tional security in the tamper-proof token model. It was shown in [33] that uncon- 
ditional (statistical) ZK proofs for NP exist in the IPCP model, and moreover 
the prover and the PCP oracle can be implemented efficiently given a witness 
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w for x € L. The verifier in the protocol of [33] exchanges only four messages 
with the prover. A main question left open in [83] was whether there exists any 
protocol that avoids such interaction between the verifier and the prover alto- 
gether (i.e. the verifier only interacts with the PCP oracle). It is easy to see 
that the latter question is equivalent to our main question above! Namely, any 
positive answer to our main question implies a proof system in which all the 
communication between the prover and the verifier consists of a single stateless 
token sent to the verifier which hides the circuit computing the PCP oracle and 
can convince the verifier about the truth of the input statement in a ZK manner. 

Therefore, if efficient ZK-PCPs for NP exist, they would lead (without 
any computational assumptions) to “noninteractive” statistical zero-knowledge 
proofs for NP using tamper-proof hardware with the extra feature of being re- 
sistant against resetting attacks, since the used token (which computes the PCP 
oracle) is stateless. 


2.2 Simplifying Bounded-Query ZK-PCPs 


Our second contribution is a simplification of the ZK-PCP construction of Kilian 
et al. [40]. The construction of [40] starts from the weakly-sound bounded-query 
ZK-PCP of [19] and compiles it into a PCP which is zero-knowledge against 
malicious verifiers of bounded query complexity. The weakly-sound PCP of [I9] 
is zero-knowledge against any k (possibly adaptive) queries, but suffers from 
the soundness error 1 — 1/ poly(k). The main tool introduced and employed 
in the compiler of [40] is called a “locking scheme”, which is an analogue of a 
commitment scheme in the PCP model. In a locking scheme a sender holds a 
secret w and randomly encodes it into an oracle cw that can be accessed by the 
receiver R (denoted as R°»). The efficient receiver should not be able to learn 
any information about w through its oracle access to o,. On the other hand, 
the sender can later send a key to the receiver to decommit the value w. The 
protocol should guarantee that the sender is not able to change his mind about 
the value w after constructing the oracle owl 

Kilian et al. [40] gave an elegant way of using locking schemes to convert a ZK- 
PCP with 1—1/ poly(k) soundness error into a standard ZK-PCP of constant or 
even negligible error. Unfortunately, the locking scheme of which forms the 
main technical ingredient of their ZK-PCP constructions is quite complicated to 
describe and analyze (pages 6 to 12 there) and uses ad-hoc algebraic techniques. 


Motivation. Most applications of ZK-PCPs considered in this work either require 
the stronger unbounded variant (see Section P.I) or alternatively can rely on an 
honest-verifier variant (see Section 2.3), which is easier to realize. However, effi- 
cient bounded-query ZK-PCPs with security against malicious verifiers can also 
be motivated by natural application scenarios. For instance, one can consider 


4 In other words, a locking scheme can be thought of as a commitment scheme with 
statistical security guarantees and minimal interaction such that during its commit- 
ment phase the sender sends only a single tamper-proof token (containing the oracle 
Ow) to the receiver. 
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the goal of distributing an NP-witness among many servers in a way that simul- 
taneously supports a very efficient verification (corresponding to the work of the 
honest verifier) and secrecy in the presence of a large number of colluding servers 
(corresponding to the query bound of a malicious verifier). One can also consider 
a “time-lock zero-knowledge proof” in which a stateless hardware token contains 
an embedded witness which can be very quickly validated but requires a lot of 
time to extract. Another motivation behind our simpler locking schemes comes 
from the line of work aiming at simplifying PCP constructions and making them 
combinatorial. The main algebraic and technical components in the final PCP 
construction of Kilian et al. [40] are (1) the PCP theorem of [B4] (which comes 
in through the construction of [19]) and (2) the locking scheme of [40]. The first 
(more important) component was considerably simplified by Dinur, and here we 
give a simplified version of the second component. (For a more extensive survey 
of this line of research see and the references therein.) 

In the full version of this paper, we formally present and analyze a simple 
combinatorial construction of a locking scheme which can be viewed as a nonin- 
teractive implementation of Naor’s commitment scheme [45] in the PCP model. 
In the following we describe the main idea. 


Technique. We start by reviewing Naor’s commitment scheme. In this commit- 
ment scheme, the parties have access to a pseudorandom generator f: {0,1}” > 
{0,1}8” and the protocol works as follows: 

The receiver chooses a random “shift” r< {0, 1}5” and sends it to the sender. 
The sender, who holds a secret input bit b, chooses a random seed s © {0,1}” 
and sends f(s) +b-r = t to the receiver (the addition and multiplication are 
componentwise over the binary field). In the decommitment phase the sender 
simply sends (b, s) to the receiver, and the receiver makes sure that f(s)+b-r = t 
holds to accept the decommitted value. 

The binding property holds because the support set of f is of size at most 
|f({0, 1}")| < 2”, and a random shift r< {0, 1}8” with overwhelming probability 
of at least 1 — 2” . 2” . 27-3" = 1 — 27” will have the property that f({0,1}”) N 
(f({0,1}") +r) = Ø. Thus for such “good” r, by sending t to the receiver 
the sender will be bound to at most one possible value of b (regardless of the 
structure of the function f). 

On the other hand, the hiding property of the scheme reduces in a black-box 
way to the pseudorandomness of f(U,,). Namely, if an efficient receiver R can 
distinguish between f(s) +r and f(s) + 1-6, another efficient algorithm D who 
uses R internally is able to distinguish f(U,,) from a random value U3,,. Thus 
it holds that if the function f is random, the scheme will be statistically hiding 
against receivers who ask at most poly(n) oracle queries to f. The reason is that 
a random function f mapping {0, 1}” to random values in {0, 1}°” is statistically 
indistinguishable from a truly random function as long as the distinguisher is 
bound to ask at most 2°”) queries to f. 

The above observation about the hiding property of Naor’s commitment scheme 
means that if, in the second round of the commitment phase, the sender chooses 
f to be a truly random function and sends f(s) + b- r to the receiver as well as 
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(providing oracle access to) f(-), then we get a secure (inefficient) commitment 
scheme in the interactive PCP model without relying on any computational as- 
sumption Ë] In our construction of locking schemes we show how to eliminate the 
first initial message r of the receiver and emulate the role of this shift r by a few 
more queries asked by the receiver and more structure in the locking oracle. 


2.3 Black-Box Sublinear ZK Arguments 


Kilian {39|, relying on the PCP construction of BE proved that assuming the 
existence of exponentially-hard collision-resistant hash functions (CRH) and 2- 
message statistically-hiding commitments, one can construct a (6-message) sta- 
tistical ZK argument for NP with polylog(n) communication complexity (where 
n is the input length). Later on, Damgård et al. [I7] showed that 2-message 
statistically-hiding commitments can be obtained from any CRH, which made 
the existence of exponentially hard CRH sufficient for the construction of Kilian. 
Micali [43| showed how to make Kilian’s protocol noninteractive in the random 
oracle model. The above constructions make a non-black-box use of the under- 
lying collision-resistant hash function. 

Our third contribution is to obtain black-box constructions of sublinear ZK 
arguments for NP by using bounded-query efficient ZK-PCPs for NP. Namely, 
we observe that the bounded-query ZK-PCP of can be employed to get an 
alternative to the ZK argument of Kilian [89] for NP which uses the underlying 
CRH function as a black box. (Our protocols are in fact fully black-box [49], in 
the sense that the security reduction makes a black-box use of the adversary, 
and have black-box simulators. ) 


Theorem 2 (Black-Box Sublinear ZK Arguments). Let H be any family of 
collision-resistant hash functions. Using H only as a black-box, one can construct 
a constant-round ZK argument system for NP with negligible soundness error 
and communication complexity sublinear in the witness size. Furthermore: 

— For the case of an honest verifier, the zero knowledge is statistical, the round 
complexity is 4 messages, and the protocol is public coin. 

— For the case of malicious-verifier zero knowledge, the round complexity is 5 
messages, and the proof of security requires that the family of CRH be secure 
against non-uniform adversaries. 

— If the family of CRH is secure against adversaries running in time mlO, 
then the communication complexity can be made polylogarithmic in the wit- 
ness size for both honest verifier and malicious verifier settings. 

— In the random oracle model, there exists an unconditionally secure non- 
interactive statistical zero knowledge argument system for NP with negligible 
soundness error and polylogarithmic communication complexity. 


We prove Theorem P]in the full version; below we describe the main ideas. 


5 Note that the random oracle f(-) is not efficiently computable. The work of 
presents an efficient construction of unconditionally secure commitments in the IPCP 
model. 

ê The more advanced PCP constructions of were not known at that time. 
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Motivation and Related Work. Our black-box construction of Theorem B]is moti- 
vated by the recent line of work on studying the power of black-box cryptographic 
constructions vs. that of non-black-box ones (e.g. 24J18)35)15]16]48]50]32] ). The 
goal in this line of work is to understand whether the non-black-box application 
of an underlying primitive P which is used in a construction of another (per- 
haps more complicated) primitive Q is necessary or a black-box construction 
exists as well. The reason behind studying this question is that the black-box 
constructions are generally much more efficient (since the source of the non- 
black-box-ness usually is an extremely inefficient Cook-Levin reduction to an 
NP-complete language). Moreover, black-box constructions are capable of also 
incorporating any physical implementations of the employed primitive P in the 
implementation of Q. 


Technique. Kilian’s argument system, when only required to be sound (and not 
ZK), has only four messages and uses the hash function as a black-box. The first 
three messages can be easily made ZK, and it is only the last message from the 
prover which potentially carries some knowledge. In this last message, the prover 
reveals some portions of the PCP. To retain the zero-knowledge property, Kilian 
substitutes the last message (of his 4-message protocol) by a zero-knowledge 
sub-protocol through which the prover convinces the verifier that he could have 
revealed the correct portion of the PCP in a way that would cause the verifier 
to accept. The latter zero-knowledge sub-protocol makes non-black box use of 
the code of the hash function used in the protocol. Thus, our goal is to remove 
the zero-knowledge sub-protocol performed at the end[] 

In order to make Kilian’s 6-message ZK argument black-box, we need to 
know more details about its first 3 rounds. The first message is simply the 
description of the hash function sent to the prover. Then by using the given 
hash function and applying a Merkel tree to the PCP the prover hashes down 
the PCP into a short string which is sent to the verifier as a commitment to whole 
PCP. With some care, one can make the hash value carry negligible information 
about the PCP. The third message (from the verifier) consists of the indices of 
symbols which the PCP verifier chooses to read from the PCP. The prover, in 
the 4th message reveals the answers to the PCP queries by revealing the relevant 
paths of the Merkel tree to the verifier. The committed hash value of the PCP 
(the second message) together with the collision-resistance property of the hash 
function prevent the prover from changing his mind about the PCP that he 
committed to in the second message. Thus the soundness of the PCP implies 
the soundness of the argument system. To keep the last message of this protocol 
zero-knowledge, as we said, Kilian’s prover will not simply reveal the relevant 
preimages, but instead would prove in a zero-knowledge manner, that he knows 
a set of preimages that would make the PCP verifier accept. 


T Barak and Goldreich also employ Kilian’s approach to get a 4-message universal 
argument without zero-knowledge. Similarly to Kilian’s protocol, to make their pro- 
tocol zero-knowledge (or just witness indistinguishable) [7] use the hash function in 
a non-black-box way. 
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Our main intuitive observation is that if instead of using the PCP of [BIA] one 
feeds (a direct product version of) the the bounded-query ZK-PCP of [I9] to the 
construction of Kilian, then the prover can safely reveal the relevant preimages 
in the last step of the basic 4-message argument of Kilian and this will not 
hurt the zero-knowledge property. The key point is that although the employed 
PCP is zero-knowledge only against bounded-query PCP verifiers, since we are 
in the prover/verifier setting, the prover can control how many queries of the 
PCP are read by the verifier, and therefore the bounded-query ZK property 
of the used PCP will suffice for the argument system to be zero-knowledge. 
Because our construction is black box, an unconditional result in the random 
oracle model follows immediately. Since this construction based on collision- 
resistant hash functions is black-box, it immediately implies an unconditional 
construction of sublinear ZK arguments in the random oracle model. Using the 
transformation of one can eliminate the interaction using the random 
oracle and obtain an unconditional construction of sublinear ZK arguments for 
NP in the random oracle model. To obtain the result for malicious verifiers 
(and negligible soundness error), we apply a variant of the Goldreich-Kahan [25] 
where both prover and verifier use statistically hiding commitments. See the full 
version of the paper for a formal description of the protocol and its analysis. 


Using NIZK? A possible alternative way to get a ZK argument (without using 
ZK-PCPs) is to use noninteractive zero-knowledge (NIZK) proofs for NP [10]|8 
To do so, the prover and the verifier should perform a coin-tossing protocol along 
with the first 3 messages of the basic variant of Kilian’s argument system, and 
this will allow the prover to be able to send a noninteractive zero-knowledge 
message to the verifier in his last message which proves to the verifier that the 
prover knows the right preimages of the hash function. This approach benefits 
from having only 4 messages exchanged, but it still uses the code of the hash 
function in a non-black-box way, and moreover, one needs to assume the existence 
of NIZK proofs for NP (in addition to the assumption that exponentially-hard 
collision-resistant hash functions exist). 


3 On Nonadaptive Efficient ZK-PCPs 


In this section we give a formal statement of Theorem[I]and more details about 
the intuition behind its proof. See the full version for a complete proof. 


Definition 3. In a probabilistically checkable proof (PCP) IT = (P,V) for a 
language L, the prover P = {m,} is an (ensemble) of distributions over proof 


oracles, V is an efficient verifier accessing a proof Tz < Ta, and the following 
properties hold. 


— Completeness: For every x € L, it holds that Prs [V7 (x) = 1] > ?/s. 
— Soundness: If x ¢ L, then for every oracle È we have Pr[V* (x) = 0] > 2/3. 


8 This variant was pointed out to us by Rafael Pass [47]. 
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The verifier V is nonadaptive if the queries it asks only depend on its own private 
randomness and the input x. (A nonadaptive verifier can prepare all of its oracle 
queries in advance and ask them in one “round”.) For the case where L € NP, 
a PCP II is called efficient if there is an NP-relation Ry (x, w) associated with 
L with the following efficiency property. Given any input x and witness w such 
that (x,w) € Rr, one can efficiently sample a circuit computing a PCP oracle 


8 
Nr — We 


Remark 4 (The Entropy of PCPs). For an input z € L, the entropy of the PCP 
oracle 7, is defined similarly to the entropy of any random variable. Note that 
for a fixed input x € L (and witness w for x € L, if the PCP is efficient), the 
distribution of m, is determined by the prover’s private randomness. Since there 
are at most 2P°¥(*) circuits of size k, any PCP oracle computable by circuits 
of size at most k = poly(n) (regardless of whether these circuits are generated 
efficiently or not) has entropy at most log(2?°"()) < poly(k) < poly(n), simply 
because any finite random variable x has Shannon entropy at most H(x) < 
log | Supp(x)|. 


Definition 5. Let IT = ({m,},V) be a PCP for the language L. II is called 
(statistical) zero-knowledge (ZK) if for every malicious poly(n)-time verifier 
7, there is an efficient simulator Sim which runs in (expected) poly(n)-time 
and for a sequence of inputs x € L the output of Sim(x) is neg(|x|)-close to 
View(7., V) (x) E A simulator Sim is called straight-line if it uses V only as a 
black-box and moreover it just outputs the result of a single interaction with ?. 
Namely, the simulator Sim interacts with V without knowing its secret random- 


ness rọ, and its output is distributed statistically close to the view of V™. 


V? 
Theorem [I] directly follows from Remark [4]and Theorem [6] below. 


Theorem 6. Let IT = ({7z},V) be a ZK-PCP for a language L with a non- 
adaptive verifier V. If (for every fixed input x) the PCP oracle {Tmz} has entropy 
at most poly(|a|), then L € AMN coAM. Moreover L € BPP if the simulator 
is straight-line 

Corollary 7. Let IT = ({m,},V) be a ZK-PCP for a language L with oracle 
entropy at most poly(n), and suppose the total length of the PCP answers re- 
turned to the verifier during a single verification is at most O(log n) bits, then 
(regardless of the adaptivity of the verifier), it holds that L E€ AMM coAM. 
(Also L € BPP if the simulator is straight-line.) 


° More formally, in that case we shall index the oracle distributions {72,0} by both 
the input and the witness. Then the completeness should hold for all x € L when 
the prover uses any witness w that x E€ L. 

10 Tn the case of efficient ZK-PCPs, the zero-knowledge property should hold regardless 
of which witness w (for x € L) is used by the prover to generate the oracle. 

11 Bounded-query ZK-PCPs of [40] and its predecessors [27[19] all have straight-line 
simulators. 
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Note that in Corollary [7] there is no bound on the length of the queries of the 
verifier, and particularly it can be applied to cases that the number of queries 
of V is O(logn) and the PCP answers (alphabet) are of constant size while the 
length of the PCP is exponential 2?°'Y(") (which makes the length of the queries 
of the verifier at least poly(n)). 


Proof (Proof of Corollary[7). Since the total length of oracle answers is O(log n) 
bits, we can modify the verifier V into another equivalent verifier V’ as follows: 
the new verifier V’ tries to ask a superset of the queries that V would ask, but 
V’ asks its queries in a nonadaptive way. In particular V’ enumerates all the 
possible answers that V might get from the oracle, continues the verification in 
each case, and prepares all the possible V queries at the beginning. There are 
at most 200087) — poly(n) many possibilities caused by different PCP answers 
in a verification, thus there will be at most poly(n) many queries asked by V’. 
After getting the answers, V’ can emulate V internally and decide as V would. 
The completeness, soundness, and zero-knowledge of V’ are inherited from those 
of V by definition. 


3.1 Main Ideas and Framework 


Here we describe the main ideas behind the proof of Theorem [6} Our AM 
protocols for L and L follow the same general framework. (The AM protocol 
for L is the more interesting case, since it implies the collapse of the hierarchy 
in case L is NP.) 

First we show that if a bounded-entropy ZK-PCP for L has a straight-line 
simulator, then L (and Z) can be decided by an efficient BPP algorithm Dz. 
At a very high level, this step uses ideas from by looking at a particular 
malicious verifier (in our case a repeated version of the honest verifier) and using 
its interaction with the straight-line simulator to decide the language. Since the 
key ideas already appear in the case of straight-line simulation, in Section B.2] 
below we start by only describing this basic case. 


Beyond Straight-Line Simulation. For the case of general (statistical) sim- 
ulation, we show how to emulate the efficient algorithm Dz above with the help 
of an untrusted prover. In particular, we first show how to emulate Dz with the 
help of some advice a, sampled from a specific distributio, and then we will 
show how to get this advice a, from an (untrusted) prover through a constant 
round protocol GetAdv. The latter protocols are implemented following similar 
frameworks introduced by Feigenbaum and Fortnow [20] (and extended in the 
followup works of [TiJ2}) in the context of studying the possibility of worst-case 
to average-case reductions for NP. Our protocol, however, is more complicated 
and uses recent and old sampling protocols from [3I[28]36]. 


12 Here we are using the term “advice” in a nonstandard way, because the advice 
distribution aw, depends on the input x (rather than only depending on the input 
length |z|). 
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3.2 The Case of Straight-Line Simulation 


In this section we present the BPP algorithm for L assuming that the ZK-PCP 
has a perfect straight-line simulator. This special case already captures the main 
ideas, and we refer the reader to the full version for the general case. 

Since the PCP verifier V is assumed to be nonadaptive, we can assume w.l.o.g. 
that V permutes its queries a1, ...,aq randomly before querying the oracle. 


The Intuition. The general framework is to use the simulator Sim to find a “good 
enough” oracle y and run a fresh instance of the verifier V against this oracle. 
This way, the correctness of our algorithm to decide membership in L follows 
from the soundness of the original PCP system. The challenge is to sample 
the oracle y in a way that makes the verifier accept in case the input x is in 
L. Suppose we run the simulator over the “mildly malicious” verifier who only 
repeats several (independent) executions of the verifier: (V!,...,V"). Then, in 
case x € L, the simulated transcript of all of these executions (V',...,V*) 
will be accepted. To define the oracle y, relying on the straight-line nature 
of the simulator, we can fix any simulated partial transcript for (V',...,V°) 
(for i € [k]) and ask Sim to answer any new query only conditioned on the 
simulated transcript of (V',...,V*). (Even though g is a randomized oracle, its 
randomness can be fixed independently of the final verification that is executed 
over y.) The main intuition is that since the entropy of the simulated transcript 
for (V1,...,V") is bounded, for most of i € [k] the simulated transcript of V’ 
has very small entropy, and relying on the non-adaptivity of V, all of its queries 
could be thought of as the “first query”, and this way the oracle y (defined 
above) behaves very close to the actual “oracle” of the simulated transcript of 
V' which leads to an accept. The formal argument follows. 


Notation. Let V'*! be an execution of k independent copies of the PCP verifier 
V. By V? we refer to the i-th execution of V in VIH (ie. VE = (V!,...,V")). 
VI] is a potentially malicious verifier whose view View(7,, V'*l) is assumed to 
be perfectly simulated by the straight-line simulator Sim (when given access to 
VI‘). The view View(z,, V'l) is composed of k random seeds r!,...,r* for V 
and k transcripts 71,...,7* such that each rê = (ai, bi... ai bi) is a partial 
transcript where {a‘,...,a/,} are the queries asked by V using the randomness r’ 
and bi = Tz (a$) is (supposedly) a corresponding returned oracle answer. We will 
only use the fact that Sim simulates (r!,...,7*) correctly and will ignore the fact 
that this is simulated jointly with random seeds (r!,...,r*). Also since we will 
use Sim only over V!*l and some input x, for simplicity in the following we will 
use Sim to denote Sim(V!I, x). Also, let m = poly(n) > H(m,) be the upper 
bound on the PCP entropy for every y € LN {0,1}”, and let € = 1/poly(n) 
be a parameter controlling the error of the BPP algorithm Dz. The formal 
description of the algorithm Dz is as follows. 


Construction 8. BPP Algorithm Dz. Set k =m. (24)? where q is the query 
complexity of V and e is the error parameter. 
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1. Randomly choose i 4È |k], and use Sim to generate (T!,...,T?™}) as prefix of 
View(r,, VEH), 

2. Choose a fresh randomness rê for the verifier V and generate the queries 
at, es ah using r’. 

3. Using the simulator Sim answer each of the queries a$ as follows to get 
the answer bj. We extend the execution of the straight-line simulator Sim 


assuming that a’, is the first query of V' conditioned on (t',...,7'~*) being 
generated already for (V',...,V*—'). 
4. Finally output whatever V decides over the view (r°, a‘, b1,... yi ba). 


Lemma 9. If II has soundness 1 — ôs, then Dr, will reject every x g L with 
probability > 1—6,, and if II has completeness 1— ôe, then Dr, will accept every 
x € L with probability > 1 — (ôe + €). 


Proof (Proof of Lemma{Q). We study the cases x € L and x € L separately. 


When x € L. The final verification of the algorithm of Construction [8B] is 
run against a randomized oracle, but this oracle can be sampled and fixed 
independently of the randomness of the verifier, thus the soundness of the 
PCP implies the soundness of Dz. More formally, define the randomized or- 
acle yt = (Tz | T},...,TŻ71) according to the distribution of the PCP oracle 7, 
conditioned on the view of V-11. Define the oracle @‘ as a randomized oracle 
that for every new query a it samples a fresh instance of the oracle y = yi 
and then answers a using y. Based on Construction [B] Dz is indeed running the 
verifier V against an instance of the oracle + @' and outputs V? (x) . Thus, 
since x ¢ L, by the soundness of V, with probability at least 1 — 6, it holds 
that V?(x) = 0. Note that if instead of asking all of the queries of the verifier 
“as the first query” we simply ask the simulator to simulate the whole view, the 
answers might not be chosen according to any fixed oracle independently of the 
randomness of V, and V might accept even though x € L. 


When x € L. Informally speaking, the verifier accepts in this case for the follow- 
ing two reasons: (1) If we sample the view of the final verification simply as the 
view of V’ as an extension of V[‘—1) all sampled by the simulator Sim (i.e. using 
the oracle y’ rather than @'), then it will be an accepted view by the definition 
of the simulator, moreover (2) since the verifier is nonadaptive and permutes its 
answers, any of its queries can be thought of as the first query. More formally, 
consider the following two mental experiments: 


1. Sample (7+,.. 7 TÎ=1) and y 4 ọ' (as defined above) and sample aż, ... atte 
(by sampling r’). Then execute q versions of the verifier V as follows. In the 
j’th execution ask the queries from ¢ in this order: (a$, ..., af, af, ..-,a$_1) 


and receive the answers (bi,..., bi, bi,.-.,b4_1). 
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2. Do the same as above, but here in the j’th execution fir st sample a fresh oracle 


pj E g’ and then ask the queries in the order (a}, a41,- --, ai, ais... 0$ i) 


to get the answers (cf, ..., c). 
Claim. Let a = m/k. Then for every j € [q], it holds that Pr[b} = d] >1-3Va. 


Now we prove ClaimB.2] A crucial point is that the queries of V are already per- 
muted randomly, and therefore rotations inside each execution will still produce 
a random execution of V (although these random executions are correlated). 
Therefore by symmetry, it would suffice to prove Claim B.2] anly for the first 
execution of the two experiments. Since H(7,) < m and that aj’s are sampled 
independently of my, therefore: 


m > H(z) > > X. >> A(b; | aj, bi, ..., a4) aj > X H(bi | r',. 7 ai), 
i€[k] jela] ie[k] 


By averaging over i and using the definition of the conditional entropy it holds 
that: 


H(bji | Tt,..., T41, ai) < m/k = a. By another aver- 


aging argument, with probability at least 1 — ,/a over sampling and fixing 
(iÈ [k],7',...,7*-1, a4), it would hold that H(bi | r1,...,7*~1, a4) < Va. We 
use the following lemma to bound the collision probability when the Shannon 
entropy is small. 


Lemma 10. For every finite random variable x it holds that Pr ie [xy = 
Ly X,T2 x 
z2] > 1 — 1.45 (x). 


Proof. Let C=Pr_ 5 s_ [£1 = z2] be the collision probability of x, let p; = 


@14—X,@Q¢-x 
Pr[x = i], and let H = H(x). By Jensen’s inequality: 37, p; logp; < log >; p? 
it holds that H > log1/c (where log 1/c is also known as the Renyi entropy). 
Therefore using e`” > 1 — x we conclude that: C > 27H — pio eye >1- 
(In2)-H >1-1.45H. 


By Lemma [10] the bounded entropy of H(b‘{ | 71,...,7'~', ai) < va implies 
that its collision probability is at least 1 — 2,/a and since c} and bi are both 
sampled from (bj | 7!,...,7°~+, a), we have Prict = bj] > 1 — 2)/a. ClaimB.2] 
now follows by a union bound. 

Claim [3.2] implies that the sampled (r’*,a{,b1,...,a/,,bq) in the algorithm Dz 
(which is the same as using the first query/answer pairs of executions in the 
second experiment) will also lead to accepting with probability at least 1 — ôe — 


3q/a = 1 — (ðe + €). 
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Progression-Free Sets and Sublinear Pairing-Based 
Non-Interactive Zero-Knowledge Arguments 


Helger Lipmaa 


Institute of Computer Science, University of Tartu, Estonia 


Abstract. In 2010, Groth constructed the only previously known sublinear- 
communication NIZK circuit satisfiability argument in the common reference 
string model. We optimize Groth’s argument by, in particular, reducing both the 
CRS length and the prover’s computational complexity from quadratic to quasi- 
linear in the circuit size. We also use a (presumably) weaker security assumption, 
and have tighter security reductions. Our main contribution is to show that the 
complexity of Groth’s basic arguments is dominated by the quadratic number 
of monomials in certain polynomials. We collapse the number of monomials to 
quasilinear by using a recent construction of progression-free sets. 


Keywords: Additive combinatorics, bilinear pairings, circuit satisfiability, non- 
interactive zero-knowledge, progression-free sets. 


1 Introduction 


By using a zero-knowledge proof, a prover can convince a verifier that some statement 
is true without leaking any side information. Due to the wide applications of zero- 
knowledge, it is of utmost importance to construct efficient zero-knowledge proofs. 
Non-interactive zero-knowledge (NIZK) proofs can be generated once can be verified 
many times by different verifiers and are thus useful in applications like e-voting. 

NIZK proofs (or arguments, that is, computationally sound proofs) cannot be con- 
structed in the plain model (that is, without random oracles or any trusted setup as- 
sumptions). Blum, Feldman and Micali showed in [4] how to construct NIZK proofs in 
the common reference string (CRS) model. During the last years, a substantial amount 
of research has been done towards constructing efficient NIZK proofs (and arguments). 
Since the communication complexity and the verifier’s computational complexity are 
arguably more important than the prover’s computational complexity (again, an NIZK 
proof/argument is generated once but can be verified many times), a special effort has 
been made to minimize these two parameters. 

One related research direction is to construct efficient NIZK proofs for NP-complete 
languages. Given an efficient NIZK proof for a NP-complete language, one can hope 
to construct NIZK proofs of similar complexity for the whole NP either by reduction 
or implicitly or explicitly using the developed techniques. In some NIZK proofs for the 
NP-complete problem circuit satisfiability (Circuit-SAT), see Tbl. [I] the communica- 
tion complexity is sublinear in the circuit size. Micali proposed polylogarithmic- 
communication NIZK arguments for all NP-languages, but they are based on the PCP 
theorem (making them computationally unattractive) and on the random oracle model. 
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Table 1. Comparison of NIZK Circuit-SAT arguments with (worst-case) sublinear argument size. 
|C| is the size of circuit, G corresponds to 1 group element and A/M/E/P corresponds to 1 
addition/multiplication/exponentiation/pairing 


CRS length|Argument length Verifier comp. 


Random-oracle based arguments 
olcia] olcia O(c M O(c a 
Knowledge-assumption based arguments from [15] 
olc PG 1 OCRE]  e(\c))M +0P 
O(|C|5)G o( o((C|>)E|O(C)M + O(|C|5)P 


(8|C| +8)M + 62P 
oD FIO(|C|)M + O(\C|3)P 
ro) FIO(|C|)M + O(|C|2)P 


Another NIZK argument for Circuit-SAT, proposed by Groth in 2009 [T4], is also based 
on the random oracle model. It is well-known that some functionalities are secure in the 
random oracle model and insecure in the plain model. As a safeguard, it is important 
to design efficient NIZK proofs and arguments that do not rely on the random ora- 
cles. Given a fully-homomorphic cryptosystem [T0], one can construct efficient NIZK 
proofs for all NP-languages in communication that is linear to the witness size [16]. 
However, since the witness size can be linear in the circuit size, in the worst case the 
corresponding NIZK proofs are not sublinear. 

In 2010, Groth proposed the first (worst-case) sublinear-communication NIZK 
Circuit-SAT argument in the CRS model. First, he constructed two basic arguments for 
Hadamard product (the prover knows how to open commitments A, B and C to three 
tuples a, b and c of dimension n, such that a;b; = c; for i € [n]) and permutation (the 
prover knows how to open commitments A and B to two tuples a and b of dimension 
n, such that ag) = b; for i € [n]). Groth’s Circuit-SAT argument can then be seen 
as a program in a program language that has two primitive instructions, for Hadamard 
product and permutation. Some of the public permutations depend on the circuit, while 
the secret input tuples of the basic arguments depend on the values, assigned to the input 
and output wires of all gates according to a satisfying assignment. The basic arguments 
then show that this wire assignment is internally consistent and corresponds indeed 
to an satisfying input assignment. For example, Groth used one permutation argument 
to verify that all input wires of all gates have been assigned the same values as the 
corresponding output values of their predecessor gates. 

In the basic variant of Groth’s pairing-based Circuit-SAT argument, see Tbl. [1] the 
argument has O(1) group elements, but on the other hand the CRS has 9(|C|)? group 
elements, and the prover’s computational complexity is dominated by O(|C|*) bilinear- 
group exponentiations. A balanced version of Groth’s argument has the CRS and argu- 
ment of @(|C|?/*) group elements and prover’s computational complexity dominated 
by @(|C|*/2) exponentiations. (See for more details on balancing. Basically, one 
applies basic arguments on length-m inputs, m < n, n/m times in parallel.) 
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We propose a new Circuit-SAT argument (see Sect. B] for a description of the new 
techniques, and subsequent sections for the actual argument) that is strongly related 
to Groth’s argument, but improves upon every step. We first propose more efficient 
basic arguments. We then use them to construct a (slightly shorter) new Circuit-SAT 
argument. In the basic variant, while the argument is again O(1) group elements, it 
is one commitment and one Hadamard product argument shorter. Moreover, in Groth’s 
argument, every commitment consisted of 3 group elements while every basic argument 
consisted of 2 group elements. In the new argument, most of the commitments consist of 
2 group elements. Thus, we saved 3 group elements, reducing the argument size from 42 
to 39 group elements, even taking into account that the new permutation argument has 
higher communication complexity (12 instead of 5 group elements) than that of [15]. 

A balanced version of the new argument achieves the combined CRS and argument 
of E(\C|!/2+°)) group elements. In the full version, we describe a zap for Circuit- 
SAT that has communication complexity of |C Pe 2+o(1) group elements, while Groth’s 
zap from has the communication complexity of @(|C|?/*) group elements. We 
also use much more efficient asymmetric pairings instead of symmetric ones, a (pre- 
sumably) weaker security assumption (Power Symmetric Discrete Logarithm instead of 
Power Computational Diffie-Hellman), and have more precise security reductions. The 
basic version of the new Circuit-SAT argument is more communication-efficient than 
any prior-art random-oracle based NIZK argument, and it also has a smaller prover’s 
computational complexity than [22]. 

Our main contribution is to note that the complexity of Groth’s basic arguments is 
correlated to the number of monomials of a certain polynomial. In [15], this polynomial 
has O(n”) monomials, where n = 2|C| + 1. We show that one can “collapse” the 
O(n?) monomials to O(N) monomials, where N is such that [N] has a progression- 
free subset (that is, a subset that does not contain arithmetic progressions of length 3) 
of odd integers of cardinality n. By a recent breakthrough of Elkin [9], N = O(n - 


22V 2(2+lo82")) — 71+), See Sect. B]for further elaboration on our techniques. 

Thus, one can build an argument of O(1) group elements for every language in NP, 
by reducing the task at hand to a Circuit-SAT instance. Obviously, one can often de- 
sign more efficient tailor-made protocols, see for some follow-up work. In par- 
ticular, used our basic arguments to construct a non-interactive range proof with 
communication of O(1) group elements, while used our techniques to design a 
new basic argument to construct a non-interactive shuffle. (See [6] for a previous use of 
additive combinatorics in the construction of zero-knowledge proofs.) 

Due to the lack of space, several proofs have been deferred to the full version [20]. 


2 Preliminaries 


Let [n] = {1,2,...,n}. Let Sn be the set of permutations from [n] to [n]. Leta = 
(ai, ..., an). Let a o b denote the Hadamard (entry-wise) product of a and b, that is, if 
c = ao b, then c; = a;b; for i € [n]. If y = h7”, then log, y := x. Let « be the security 
parameter. If 0 < Ay < ++ < Ay <+ < Àn = poly (xK). then A = (A1, Àn) CZ 
is an (n, &)-nice tuple. We abbreviate probabilistic polynomial-time as PPT. If A; and 
Ag are subsets of some additive group (Z or Zp in this paper), then Ay +42 = {A1 +A2 : 
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Ay EMAA € Ag} is their sum set and A, — Ag = {v1 — à2 : À E Ay A A2 € A2} is 
their difference set [25]. If A is a set, then kA = {A1 +-+ -+ Ak : A; € A} is an iterated 
sumset, k- A = {kA : A € A} is a dilation of A, and 2A = {A1 + A2 : AÀ CAA AVE 
AA Ài Æ Ag} C A+ Aisa restricted sumset. (See [25].) 

Let Gpp(1") be a bilinear group generator that outputs a description of a bilinear 
group gk := (p, Gi, Go, Gr, ê) — Gpp(1"), such that p is a «-bit prime, G1, G2 and 
Gr are multiplicative cyclic groups of order p, ê : Gi x G2 — Gr is a bilinear map 
(pairing) such that Va,b € Z and ge E€ Gi, é(g%,93) = (gi, 92). If g generates 
G fort € {1,2}, then ê(g1, g2) generates Gr. Deciding the membership in G1, G2 
and Gr, group operations, the pairing ê, and sampling the generators are efficient, and 
the descriptions of the groups and group elements are O(«) bit long each. Well-chosen 
asymmetric pairings (with no efficient isomorphism between G, and G2) are much 
more efficient than symmetric pairings (where Gj = G2). For x = 128, the current 
recommendation is to use an optimal (asymmetric) Ate pairing over a subclass of 
Barreto-Naehrig curves [2]. In that case, at security level of x = 128, an element of 
G,/G2/Gr can be represented in respectively 512/256/3072 bits. 

A (tuple) commitment scheme (Geom, Com) in a bilinear group consists of two PPT 
algorithms: a randomized CRS generation algorithm Goom, and a randomized com- 
mitment algorithm Com. Here, Gij,(1%,n), t € {1,2}, produces a CRS ck;, and 
Com*(ck;;a;r), with a = (a1,...,@,), Outputs a commitment value A in G; (or in 
G? for some b > 1). We open Com! (ck;; a; r) by outputting a and r. 

A commitment scheme (Geom, COM) is computationally binding in group G+, if for 
every non-uniform PPT adversary A and positive integer n = poly(«), the probability 

ck; - Gt 


com 


(1%, n), (a1, r1, a2, r2) +} A(ckz) : 


(a1,71) Æ (a2, r2) A Com! (ckr; a1; r1) = Comt (ckr; a2; r2) 


is negligible in x. A commitment scheme (Geom, Com) is perfectly hiding in group G+, 
if for any positive integer n = poly(k) and ck; € Gf,m(1”, n) and any two messages 
1, az, the distributions Com! (ckz; a1; -) and Comt (ckz; ag; +) are equal. 

A trapdoor commitment scheme has three additional efficient algorithms: (a) A trap- 
door CRS generation algorithm inputs t, n and 1“, and outputs a CRS ck* (that has the 
same distribution as Gf m(1“, m)) and a trapdoor td, (b) a randomized trapdoor commit- 
ment that takes ck* and a randomizer r as inputs and outputs the value Com! (ck*; 0; r), 
and (c) a trapdoor opening algorithm that takes ck”, td, a and r as an input and outputs 
an r’ such that Com‘ (ck*; 0; r) = Com! (ck*; a; r’). 

Let R = {(C,w)} be an efficiently computable binary relation such that |w| = 
poly(|C]). Here, C is a statement, and w is a witness. Let L = {C : dw, (C, w) E€ R} 
be an NP-language. Let n be some fixed input length n = |C]. For fixed n, we have 
a relation Rn and a language Ln. A non-interactive argument for R consists of the 
following PPT algorithms: a common reference string (CRS) generator Gers, a prover 
P, and a verifier V. For crs + Gas(1*%,n), P (crs; C, w) produces an argument Y. The 
verifier V (crs; C, Y) outputs either 1 (accept) or 0 (reject). 

A non-interactive argument (Gers, P, V) is perfectly complete, if Yn = poly(«), 


Pricrs + Ges(1*, n), (C, w) — Rn : V(ers; C,P(crs;C,w)) =1j=1. 
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A non-interactive argument (Gers, P, V) is (adaptively) computationally sound, if for all 
non-uniform PPT adversaries A and all n = poly(«), the probability 


Pricrs + Gers (1*, n), (C, Y) — Alers) : C Z LA Vers; C, Y) = 1] 


is negligible in x. The soundness is adaptive, that is, the adversary sees the CRS before 
producing the statement C. A non-interactive argument (Gers, P, V) is perfectly witness- 
indistinguishable, if for all n = poly(s), if crs € Gers(1*, n) and ((C, wo), (C, w1)) € 
R2, then the distributions P (crs; C, wo) and P (crs; C, w1) are equal. 

A non-interactive argument (Gers, P, V) is perfectly zero-knowledge, if there exists a 
PPT simulator S = (S1, S2), such that for all stateful non-uniform PPT adversaries A 
and n = poly(«) (with td being the simulation trapdoor), 


crs + Gers(1", 2), (crs; td) + S,(1*,n), 

m (C, w) + A(crs), ai (C, w) 4+ A(crs), 
p + P(crs; C, w) : w < S2(crs; C, td) : 
(C, w) E Rn AAW) =1 (C, w) E Rn AAW) =1 


3 Our Techniques 


We will first give a more precise overview of Groth’s Hadamard product and permuta- 
tion arguments [15], followed by a short description of our own main contribution. For 
the sake of simplicity, we will make several simplifications (like the use of symmetric 
pairings) during this discussion. 

Groth uses an additively homomorphic tuple commitment scheme that allows one to 
commit to a long tuple, while the commitment itself is short. The best known such com- 
mitment scheme is the extended Pedersen commitment scheme in a multiplicative cyclic 
group of order p and a generator g, where the commitment of a tuple a = (a1,..., an) 
with randomness ra is equal to Com(a; ra) := g™° - [] 97". Here, one usually chooses 
n random secrets x; + Zp, and then sets g; + g*’. Following [12], Groth chooses 


a single random secret x + Zp and then sets g; +— g” . In this case, the commitment 


n 
Com(a; ra) = 9" - J [g = gee 
i=l 


can be seen as a lifted polynomial ra + Da aixt in x, that the committer (who does 
not know x) computes from n given values g; = g”. The first obvious benefit of this 
commitment scheme is that it has a shorter secret (1 element instead of n elements). 
Groth’s Hadamard product argument, where the prover aims to convince the verifier 
that the opening of C = Com(c; re) is equal to the Hadamard product of the openings 
of A = Com(a;r,) and B = Com(b; 7p) (that is, a:b; = c; (mod p) for i € [n]), 
is constructed as follows. Let A = g™ - Ila g;' be a commitment of a and B = 
g- Ika g” be a commitment of b by using the generator tuple (g1,..., gn). Let 
C = g"° -J [i=1 Iiinp1) be a commitment of b and D = J [;-1 Ji(n+1) be a commitment 


of 1 = (1,...,1) by using a different generator tuple (Gn41,---; 9n(n+1)): 
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Groth’s Hadamard product argument is based around the verification equation 


that (analogously to the Groth-Sahai proofs [17], though the latter only considers the 
much simpler case n = 1) can be seen as a mapping of the required equality a o b = 
co 1 to another algebraic domain, with Y compensating for the use of a randomized 
commitment scheme. One gets that é(A, D) is equal to ê(g, g)"), where 
F(2) = (ra + Z2 aie): (ro FIO, bia" Y) — (re ea?) (SI ail) 
is the sum of two formal polynomials i i x, F(x) = Feon(x) + Fy (x), where Feon (x) = 
yo (aibi — c)a+?) is a constraint polynomial, spanned by the powers of x from 
Acon = {iln + 2) : i € [n]}, and 


Fy (x) =rarot+ ro 5 aix’ + X (rabi — rect) 4 5 X (aibj — ci) tI) 
i=1 i=1 


i=1 j=1 
j+i 
is an argument polynomial, spanned by the powers of x from Ay = {0}U[n]U{i(n+1) : 
E€ [n] }U{i+j(n+1): i,j € [n] At F j}. One coefficient of Feon(x) corresponds to 
one constraint a;b; = c; that the honest prover has to satisfy, and is 0 if this constraint 
is true. Thus, all coefficients of Foo, are equal to 0 iff the prover is honest. 

By using homomorphic properties of the commitment scheme, the prover constructs 
the argument Y = g¥¥) as = g’"..--- J], Jati gees i 1): This can be 
done, since the prover — who knows how to open the commitments but does not know 
the secret x — knows all coefficients rar», . . ., aibj — ci. He also knows the generators 
9, . - -» Ji+j(n+1) If the O(n?) generators ge, for £ € Ay, are included to the CRS. Thus, 
the CRS has O(n?) group elements and the computational complexity of the prover is 
O(n?) bilinear-group exponentiations. On the other hand, the verifier’s computational 
complexity is @(1) pairings, since she only has to check Eq. (i). 

For the soundness, one needs that when a;b; 4 c; for some i € [n], then a satisfying 


y cannot be computed from the elements g” that are in the CRS; otherwise, a dishonest 

prover would be able to compute a satisfying argument. This means that for i € [n], 

gh should not belong to the CRS. To be certain that this is true, one needs 

(a) that g” is in the CRS for values £ € Ay but if £ € Acon, then g” does not belong 
to the CRS (elements from 2 - A \ A are allowed), 

(b) an appropriate security assumption that states that computing gř* for Fy = 


Se Ay pcx” is only possible if one knows all values g” for l € Ay, and 
(c) that Acon N Ay = . (This is also a prerequisite for (a).) 


One can guarantee (a) by the choice of the CRS. But also (c) is clearly true, since Acon 
and Ay do not intersect. 

To finish off the whole argument, one has to define an appropriate security assump- 
tion for (b). Since constructing sublinear NIZK arguments is known to be impossible 
under standard assumptions (see Sect.[2), one of the underlying assumptions is a knowl- 
edge assumption (PKE assumption, as in [15], see Sect. 5). The whole argument will 


Progression-Free Sets and Sublinear Pairing-Based NIZK Arguments 175 


become (slightly!) more complex since all commitments and arguments also have to 
include a knowledge component. 

Groth’s permutation argument is based on a very similar idea and has basically the 
same complexities. The only major difference is that if the permutation is a part of the 
prover’s statement, then the verifier also has to perform O(n) bilinear-group multipli- 
cations. Since Groth’s Circuit-SAT argument consists of a very small (< 10) number of 
Hadamard product and permutation arguments, then it just inherits the complexities of 
the basic arguments, as also seen from Tbl.[I] where, in the basic variation, |C| = n and 
thus the CRS has O(|C|?) group elements, the argument length is 42 group elements, 
the prover’s computational complexity is O(|C|?) exponentiations, and the prover’s 
computational complexity is dominated by O(|C]) bilinear-group multiplications. 

Groth’s Circuit-SAT argument has several sub-optimal properties that are all inher- 
ited from the basic arguments. While it has succinct communication and efficient ver- 
ification, its CRS of O(|C|*) group elements and prover’s computation of O(|C|*) 
exponentiations (in the basic variant) seriously limit applicability. Recall that here 
n = 2|C| + 1. A smaller problem is the use of different generators (g1,..., gn) and 
(9n+15+++59n(n+1)) While committing to different elements. 

We note that Feo, has n monomials (1 per every constraint a;b; = c; that a honest 
prover must satisfy). On the other hand, Fy, has O(n?) distinct — since i1 +j1(n+1) Æ 
i2 + jo(n + 1) if i1, j1, 22, Jo € [n] and (21, 71) Æ (i2, j2) — monomials. The number 
of those monomials is the only reason why the CRS has O(n) group elements and the 
prover has to perform O(n?) bilinear-group exponentiations. 

We now show how to collapse many of the unnecessary monomials into one, so 
that the full argument still remains secure, obtaining a polynomial Fy (x) that has only 
n'+°)) monomials. First, we generalize the underlying commitment scheme. We still 
choose a single x <~ Zp and set gi + g7, but we allow the indexes of n generators 
(Izis: -<3 9An ) that are used to commit, to actually depend on the concrete argument 
— with the main purpose to be able to obtain as small Ay, as possible, while still guar- 
anteeing that Fon = 0 iff the prover is honest, and that Acon N Ay = Ø. Assume that 
A = (Aj,...,An) is an (n, &)-nice tuple of integers, so An = max; ;. Thus, 


n 
; s— 7a ai rat doy viet 
Com(a; ra) := g [os = gettin ; 
i=1 


The polynomial ra + >>, això has degree (up to) Àn, but it only has (up to) n + 1 
non-zero monomials. We now start again with the verification equation Eq. (I), but this 
time we assume that all A, B, C and D have been committed by using the same set of 
generators (g),,-..,9,,)- Since F(x) = (ra +X}; aix™) (To +X; Dia) — (re + 
ye G2) (Yi r~), we get that F(x) = Foon(x) + Fy (£), where 


n 


Font) =>" (aibi — e:)2™ , (2) 
i=1 
Fy(a) =rare + X (rabi + reas — re)a™ + > X (aids — ca. (3) 
i=1 i=1 j=l 
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Here, the powers corresponding to nonzero coefficients belong either to the set Acon = 
2. A := {2d : i € [n]} or to the set Ay = A := {0} U AU 25A, where 27A := 
{Ai + Az 24,9 € n] AiE J}. 

If the prover is honest (that is, a;b; — ci = 0 for all 2), then the coefficients a;b; — ci 
corresponding to the powers in the set 2 - A are equal to 0. Therefore, an honest prover 
can compute the argument % = g’¥ as gutcd rez" = Tre alg” where the 
coefficients ue are known to the prover. This means that all elements g £l € A, have 
to belong to the CRS, and thus the CRS contains at least |A| < 2A, group elements. 
Recall that in [15], one had to specify O(n”) elements in the CRS. 

For the soundness, we again need (a-c), as in the case of Groth’s argument, to be 
true. One can again guarantee (a) by the choice of the CRS, and one has to define a 
reasonable security assumption (PKE assumption) for (b). Finally, achieving (c) is also 
relatively easy. Namely, one can guarantee that 0 ¢ 2- A and AN 2- A = Ø by choosing 
A to be a set of odd] integers. It is almost as easy to guarantee that 2- A N 254 = Q 
as soon as one rewrites this condition as 2A, Æ A; + A; for i A j, and notices that 
this is equivalent to requiring that no 3 elements of A are in an arithmetic progression. 
That is, A is a progression-free set B5). Thus, it is sufficient to assume that A is a 
progression-free set of odd integers. 

Recall that the CRS length (and the prover’s computational complexity) depend 
on |Â] and thus it is beneficial to have as small |A] < 2A, possible. This can 
be guaranteed by upper bounding An, that is, by finding as small An as possible 
such that [An] contains a progression-free subset of odd integers of cardinality n. 
To bound àn, we show in Sect. [4] (following a recent breakthrough of Elkin [9]) 
that any range [N] = {1,..., N} contains a progression-free set of odd integers of 


size n = O(N (logy N)1/4/22V2!082N) — N1—0(), and thus one can assume that 
An = n't), (One can obtain An = O(n - 2?V?@+1982”)) by inverting a weaker 
version of Elkin’s result.) In the full version, we give another proof of this result that, 
while based on Green and Wolf’s exposition of [9], provides more details and is 
slightly sharper. In particular, Elkin’s progression-free set is efficiently constructible. 
Groth’s permutation argument uses similar ideas for a different choice of A, B, 
C, and D, and thus also for a different set Ay. Unfortunately, if we use it with the 
new generalized commitment scheme (that is, with general A), we obtain the guar- 
antee a,(i) = bi only if A is a part of the Moser-de Bruijn sequence [23]. But then 
An = O(n?) and one ends up with a CRS of O(n”) group elements. We use the follow- 
ing idea to get the same guarantees when A is an arbitrary progression-free set of odd 
integers. We show that if A is a progression-free set of odd integers, then Groth’s per- 
mutation argument guarantees that as) = T4 (i, @)- bi, where T4 (i, o) > 1 is an easily 
computable and public integer. We use this result to show that for some separately com- 
mitted tuple a*, až; = T(t, o) - bi for i € [n]. We then employ an additional product 
argument to show that a¥ = T4 (07+ (i), o) - a; for i € [n]. Thus, aq) = b; for i € [n]. 
We obtain basic arguments that only use © (Ap) = n1*°0) generators { g”: tE Â}. 
This means that the CRS has n‘+°“) group elements and not O(n?) as in [15]. In both 


' Oddity is not strictly required. For AM 2 - A = Ý to hold, one can take A := {(2i + 1)2” : 
i,j > 0}, see OEIS sequence A003159. Dealing with odd integers is however almost as good. 
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basic arguments, the prover has to compute ~ (which takes O(n”) scalar multiplications 
or additions in Zp and n'+e(1) bilinear-group exponentiations). As in [15], the prover’s 
computation can be optimized even further by using efficient multi-exponentiation al- 
gorithms. The verifier has to only perform O(1) bilinear pairings. In the case of the 
permutation argument, she also has to compute O(n) bilinear-group multiplications, 
though the multiplications can be done offline if the permutation is fixed. Thus, the new 
basic arguments are considerably more efficient than Groth’s. 

The soundness of the new product argument is based on two assumptions, a compu- 
tational assumption (A-PSDL, see Sect. [5) and a knowledge assumption (A-PKE, see 
Sect.[5). Groth [15] used [an]-PKE (for a constant a) and [an?]-CPDH (which is a pre- 
sumably stronger assumption than PSDL). Since A, Ay, are small subsets of [an], then 
our assumptions can be expected to be somewhat weaker in general. Finally, the secu- 
rity reduction in the proof of the product argument takes time O(t(,,)) in our case and 
O(t(an”)) in Groth’s case, where t(m) is the time to factor a degree-m polynomial. 


4 Progression-Free Sets 


A set of positive integers A = {\1,..., An} is progression-free [25], if no three ele- 
ments of A are in an arithmetic progression, that is, A; + Aj = 2A, only if i = j = k, 
or equivalently, CAN 2- A = Í. 

Let r3(N) denote the cardinality of the largest progression-free set that belongs 
to [N]. For any N > 1, the set of integers in [N] that have no ternary digit equal 
to 2 is progression-free. If N = 3%, then there are 2% — 1 such integers, and thus 
r3(N) = Q(N'8s?) = 2Q(N°-?). Clearly, this set can be efficiently constructed. As 
shown by Behrend in 1946 [B], this idea can be generalized to non-ternary bases, with 


r3(N) = Q(N/(22v 7 98N . logs/* N)). Behrend’s result was improved in a recent 


breakthrough by Elkin [9], who showed that r3(N) = Q(N - logs/* N /2? V108 N), 
We have included a proof of Elkin’s result in the full version. Our proof is closely 
based on but it has a sharper constant inside (2. Moreover, our proof is much 
more detailed than that given in [13]. While both constructions employ the pigeon- 
hole principle, Elkin’s methodology can be used to compute his progression-free set 
in quasi-linear time N - 20(v1°8 N), see [9]. On the other hand, Bourgain showed 
that r3(N) = O(N - (log N/loglog N)'/2), and recently Sanders showed that 
r3(N) = O(N - (log log N)°/ log N). Thus, according to Behrend and Elkin, the min- 
imal N such that r3(N) = nis N = n!+°), while according to Sanders, N = w(n). 
We need the progression-free subset to also consist of odd integers. For this, one 
can take Elkin’s set A = {A1,...,An} C [N], and then use the set 2. A+1 = 
{2\, +1,...,2An + 1}. Clearly, if A € [n4+°] then also 2- A +1 € [nt +e}. 


Theorem 1. Ler r3“¢(N) be the size of the largest progression-free set in [N] that only 


consists of odd integers. For any n, there exists N = n'+°), such that rgdd(N) =n. 


5 Cryptographic Tools 


In this section, we generalize the PKE assumption from and then define two new 
cryptographic assumptions, PDL and PSDL, and prove that PSDL is secure in the 
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generic group model. After that, we proceed to describe a generalization of Groth’s 
knowledge commitment scheme from and prove that it is computationally bind- 
ing under the PDL assumption. Groth proved in that his commitment scheme is 
computationally binding under the (potentially stronger) CPDH assumption. 


A-Power (Symmetric) Discrete Logarithm Assumption. Let A be an (n, «)-nice 
tuple for some n = poly(#). We say that a bilinear group generator Gp, is (n, K)-PDL 
secure in group G; for t € {1,2}, if for any non-uniform PPT adversary A, Pr[gk := 
(p, G1, Go, Gr, ê) — Gop(1"), ge — Ge \ {1}, £ + Zp : Algk; (GF )eeqoyua) = z] is 
negligible in «. Similarly, we say that a bilinear group generator Gp, is A-PSDL secure, 
if for any non-uniform PPT adversary A, 


gk := (p, G1, Go, Gr, ê) — Gop(1"), gi — Gi \ {1}, 
£ £ 
ga + G2 \ {1}, £ & Zp : A(gk; (g7 , 93 Jee{o}ua) =T 


is negligible in x. A version of P(S)DL assumption in a non pairing-based group was 
defined in [12]. Cheon showed in that if n is a prime divisor of p — 1 or p +1, 
then the [n]-PDL assumption can be broken by a generic adversary in O((,/p/n + 
y/n) log p) group operations. Clearly, if the A-PSDL assumption is hard, then the A- 
PDL assumption is hard in both G; and G2. Moreover, if the bilinear group generator is 
CPDH secure, then it is also P(S)DL secure. Therefore, by the results of [15], P(S)DL 
holds in the generic group model. 


Theorem 2. The A-PSDL assumption holds in the generic group model for any (n, k)- 
nice tuple A given that n = poly(K). Any successful generic adversary for A-PSDL 
requires time (2(,/p/An) where An is the largest element of A. 


A-Power Knowledge of Exponent Assumption (A-PKE). Abe and Fehr showed 
in that no statistically zero-knowledge non-interactive argument for an NP- 
complete language can have a “direct black-box” security reduction to a standard cryp- 
tographic assumption unless NP C P/poly. (See also [II].) In fact, the soundness 
of NIZK arguments (for example, of an argument that a perfectly hiding commitment 
scheme commits to 0) is often unfalsifiable by itself. Similarly to Groth [15], we will 
base our NIZK argument for circuit satisfiability on A-PKE, an explicit knowledge as- 
sumption. This assumption was proposed by Groth (though only for A = [n]). 

Let t € {1,2}. For two algorithms A and X4, we write (y;z) < (A||X4)(x) if 
A on input x outputs y, and X4 on the same input (including the random tape of A) 
outputs z. Let A be an (n,«)-nice tuple for some n = poly(«). The bilinear group 
generator Gp, is A-PKE secure in group G, if for any non-uniform PPT adversary A 
there exists a non-uniform PPT extractor X 4, such that 


gk = (p, Gi, G2,Gr, é) Ss Gop(1"), gt Gt \ {1}, (a, x) s Ze, 
2 ant z 
py | crs © (gk; (98, 98 Jee foyua)s (6, & r, (ae)eea) + (AIXA) (ers) : 


a £ 
ée=ncég- [o 
LEA 
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is negligible in «. That is, if A (given access to crs that for a random â contains both 
gz and got iff 2 € {0} U A) can produce c and é such that é = cĉ, then X4 (given 
access to crs and to the random coins of A) can produce a tuple (r, (a¢)ec a) such that 


c= gi Ilea git’. Groth proved that the [n]-PKE assumption holds in the generic 
group model; his proof can be straightforwardly modified to the general case. 


New Commitment Scheme. We use the following variant of the knowledge commit- 
ment scheme from with a generalized choice of generators, defined as follows: 


CRS generation: Let A be an (n,«)-nice tuple with n = poly(K). Define Ap = 0. 
Given a bilinear group generator Gpp, set gk := (p, G1, G2, Gr, ê) <— Gp (1"). Let 
gı + Gi \ {1}, g2 — Go \ {1}, and d,2 + Zp. Lett € {1,2}. The CRS is 
ckt + (gk; (9¢,;5 9¢,r; )i€{0,....n})» Where gre = gf and ĝu = gf". 

Commitment: To commit to a = (a1,...,@n) € Zp, the committing party chooses a 
random r + Zp, and defines 


n n 
Com*(ck; a;r) := (9? Loio ar LLa) - 
i=l i=l 


Importantly, we allow A to depend on the concrete application. Let t = 1. Fix a com- 
mitment key ck, that in particular specifies g2, ĝ2 € G2. A commitment (A, A) € G? 
is valid if ê( A, G2) = ê(A, go). The case t = 2 is dual. 


Theorem 3. Let t € {1,2}. The knowledge commitment scheme is perfectly hiding in 
G+, and computationally binding in G; under the A-PDL assumption in Gy. If the A- 
PKE assumption holds in G+, then for any non-uniform PPT A that outputs some valid 
knowledge commitments, there exists a non-uniform PPT extractor X , that, given the 
input of A together with A’s random coins, extracts the contents of these commitments. 


In the case of all security reductions in this paper, the tightness of the security reduction 
depends on the value An. Clearly, the knowledge commitment scheme is also trapdoor, 
with the trapdoor being td = zx: after trapdoor-committing A + Comt (ck; 0; r) = g? 
for r + Zp, the committer can open it to (a;r — $f; a;x™) for any a. 


6 New Hadamard Product Argument 


Assume that (Geom, Com) is the knowledge commitment scheme. In an Hadamard prod- 
uct argument (in group G1, the case of Gg is dual), the prover aims to convince the ver- 
ifier that given commitments A, B and C, he can open them as A = Com! (ck; a; fa) 
B = Com! (ck; b; rẹ), and C = Com! (ck; c; re), s.t. cj = ajb; for j € [n]. Groth con- 
structed an Hadamard product argument with communication of 5 group elements, 
verifier’s computation O(n), prover’s computation of O(n?) exponentiations and the 
CRS of O(n?) group elements. We present a more efficient argument in Prot. [I] Intu- 
itively, the discrete logarithm on basis h = ê(g1, g2) of €(A, Bz) /é(C, D) = é(g1, Y) 
is a degree-n formal polynomial in X, which is spanned by {X4} te2.AuA» Where 


A:={0}UAUZA . (4) 


We need that 2- A and A do not intersect. The next lemma is straightforward to prove. 
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System parameters: Let n = poly(K). Let A = {X; : i € [n]} be a progression-free set of 
odd integers, such that Ai41 > Ai > 0. Denote Ao := 0. Let Abe as in Eq. (4). 

CRS generation G.;(1"): Let gk := (p,Gi,Geo,Gr,é) < Gpp(1"). Let &,x + Zp. 
Let gı < Gi \ {1} and g2 < Ge \ {1}. Denote gre + gf and Gre + gê” 
fort € {1,2} and £ € {0} UA. Let D © JIi; g2); The CRS is crs + 
(gk; (gre, 910) ee fo}UAs (922, G2e) oe â» D). Let cki + (gk; (gie, ĝaeJectozua). 

Common inputs: (A, Â, B, Ê » Ba, C, ô), where (A, Â) + Com 1(ékı; a; ra), (B, B) + 
Com! (ćk1; b; Tp), Bo + ga? - fe HREH Ree (C, C) 4+ Com! (cki; c; Te), S.t. aibi = ci for 
i € [n]. 

Argument generation Px (crs; (A, Â, B, B, B2,C, C), (a, ra, b, Tb, C, rc)): Let (£) := 
{ij ij € [NJ Ag Æ indi tà = LC}. For gl € 2A, the prover sets 
Le <— ager, (0) laibi — ci). He sets w {= ga”: I- Al oe: Senne Ilera Gaps 
and % & 9° - TE (an atrpai—Te Theeoxa 9 95¢- He sends Y* + (a, 2) € G3 to the 
verifier as the argument. 

Verification Vx (crs; (A, A, B, Ê, B2, ©, Ĉ), Y* ): accept iff é(A, B2)/é(C, D) = é(g1, Y) 
and ê(g1, Y) = é(j1, yp). 


Protocol 1: New Hadamard product argument [(A, A)]  [(B, Ê, B2)] = [(C, C)] 


Lemma 1. /) If A is a progression-free set of odd integers, then 2- AN A = 9. 2) If 
2-AN A= f, then A is a progression-free set. 


Moreover, since A € {0,...,2An}, then by Thm. [I] 
Lemma 2. For any value n there exists a choice of A such that | A| = nite), 


We are now ready to state the security of the new Hadamard product argument for the 
knowledge commitment scheme. The (knowledge) commitments are (A, A), (B, B) 
and (C, C). For efficiency reasons, we include another element B2 to the Hadamard 
product language. We denote the argument in Prot. [I] by [(A, A)] o [(B, Ê, Bz)] = 
[(C, C)]. Since (C, Ĉ) is always a commitment of (a1b1,...,anbn) for some value 
of re, we cannot claim that Prot. []is computationally sound (even under a knowledge 
assumption). Instead, analogously to [15], we prove a somewhat weaker version of 
soundness that is however sufficient to achieve soundness of the Circuit-SAT argument. 
Note that the last statement of the theorem basically says that no efficient adversary can 
output an input to the Hadamard product argument together with an accepting argument 
and openings to all commitments and all other pairs of type (y, ĝ) that are present in 
the argument, such that a;b; 4 ci for some i € [n]. Intuitively, the theorem statement 
includes f; only for £ € A (resp., ag for L € A together with r) since gop (resp., 910) 
belongs to the CRS only for £ € A (resp., £ € {0} U A). 


Theorem 4. Prot. is perfectly complete and perfectly witness-indistinguishable. If 
Gop is A-PSDL secure, then a non-uniform PPT adversary has negligible chance 
of outputting inp* < (A, A, B, B, Bo, C, C) and an accepting argument w~ 

(a), ~) together with a witness w* << (a, Ta, b, Tb, C, Te, (fo)ge 4) Sete (A, A) = 
Com! (ckı;a;ra), (B, Ê) = Com’ (ckı; b; ro), Be = 97 - Tiga» (CÊ) = 


te i 5 tt 
Com! (ck1; c; 1), (Y, v) = (95 Dees fer gue fe ), and for some i € |n], aibi 4 ci. 
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The commitment scheme is defined as in Sect.[5]with respect to the set A. The following 
proof will make the intuition of Sect. B] more formal. Note that the tightness of the 
reduction depends on the time it takes to factor a degree (2\,, + 1)-polynomial. 


Proof. Leth © €(g1,92) and F(x) + log, (é(A, B2)/ê(C, D)) like in Sect. 
WITNESS-INDISTINGUISHABILITY: since the argument Y% = (W, w) that satisfies the 
verification equations is unique, all witnesses result in the same argument, and therefore 
the Hadamard product argument is witness-indistinguishable. 

PERFECT COMPLETENESS. Assume that the prover is honest. The second verifica- 
tion is straightforward. For the first one, due to discussion in Sect.B] F(x) = Foon (£) + 
Fp (x), where Foon (x) and Fy (a) are as defined by Eq. (2) and Eq. (3). Consider x to be 
a formal variable, then F(X) is a formal polynomial of X. This formal polynomial is 
spanned by {X“},<.. 4.4: If the prover is honest, then c; = a; - b; for i € [n], and thus 


F(X) = Fy(X) is spanned by {X“},_ ;. Denoting Y + 95?” - Ii on! dime 


I- 1 I- 1:jŻi g i [x = = ga I =1 ed pete ‘Tleeo~ gs , we see that clearly 
e(g1, Y) = h. Thus, the first verification succeeds. 

WEAKER VERSION OF SOUNDNESS. Assume that A is an adversary that can break 
the last statement of the theorem. We construct an adversary A’ against the A-PSDL as- 
sumption. Let gk + Gpp(1"),  — Zp, gi — G1 \ {1}, and g2 + G2 \ {1}. The adver- 
sary A’ receives crs < (gk; (gt ; 9 Jea) as her input, and her task is to output x. She 
sets å & Zp, crs’ & (ak; (97°, 98 )ec oyun, (03, 98% ees ITa 98"), and then 
sends crs’ to A. Clearly, crs’ has the same distribution as G.,3(1"). Both A and A’ set 
ck; + (gk; (92, gê” Jeetojua) fort € {1,2}. Assume that A returns (inp* , w% ,%>) 
such that the conditions in the theorem statement hold, and V(crs’; inp% , %9% ) accepts. 
Here, inp* = (A, A, B, B, B,C, Ô) and w* = (a, fa, b, Tb, C, Tc, (oec a): 

If A is successful, (A, A) = Com! (cky; a; ra), (B, Ê) = Com! (cky; b; Tp), Bo = 
go> Iia Gs (C, C) = = Com! (ck; C; rc), and for some i € [n], ci Æ aibi. Since 
2- AN Â =O, A’ has thus expressed F(X) as a polynomial f(X) where at least for 
some £ € 2- A, X“ has a non-zero coefficient a;b; — c;. 

On the other hand, A also outputs (f;)pe 4, $-t. F(x) = logg, Y = f'(x), where all 
non-zero coefficients of f'(X) := $ pe à f(X" correspond to X“ for some £ € A. Since 
A is a progression-free set of odd integers and all elements of 2 - A are distinct, then by 
Lem. [I] £ ¢ 2 - A. Thus, all coefficients of f’(X) corresponding to any X*, 0 € 2- A, 
are equal to 0. Thus f(X) = dope âu(2-4) feX" and F'(X) = ye å fiX" are different 
polynomials with f(x) = f'(x) = F(x). Thus, A’ has succeeded in creating a non-zero 
polynomial d(X) = f(X) — f’(X), such that d(x) = ee Au(2-A) dex! = 0. 

Next, A’ uses an efficient polynomial factorization algorithm in Zp|X] to effi- 


ciently compute all < 2\,,+1 roots of d(X). For some root y, gt. = gh. The adversary 
A’ sets x + y, thus violating the A-PSDL assumption. 


The Hadamard product argument is not perfectly zero-knowledge. The problem is that 
the simulator knows td = (â, x), but given td and the common input she will not 


be able to gone w™. E.g., she has to compute y = g57" - Ili Cs eae em 


IM M j=1 95 y 4 ay based on the input, â and x, but without knowing the witness. 
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This seems to be impossible. Technically, the problem is that due to the knowledge 
of the trapdoor, the simulator can, knowing one opening (a,r), produce an opening 
(a’,r’) to any other a’. However, here she does not know any openings. Similarly, the 
permutation argument of Sect. [7]is not zero-knowledge. On the other hand, in the final 
circuit satisfiability argument of Sect.[8] the simulator creates all commitments by her- 
self and can thus properly simulate the argument. By the same reason, the subarguments 
of are not zero-knowledge but the final argument (for circuit satisfiability) is. 

Let A be as described in Thm. [I] The communication (argument size) of Prot. [His 2 
elements from Gp. The prover’s computational complexity is O(n”) scalar multiplica- 
tions in Zp and n'+e(1) exponentiations in G2. The verifier’s computational complexity 
is dominated by 5 bilinear pairings and 1 bilinear-group multiplication. The CRS con- 
sists of n!+°™) group elements, with the verifier’s part of the CRS consisting of only 
the bilinear group description plus 5 group elements. 

In the Circuit-SAT argument, all a;, b; and c; are Boolean, and thus all n values 
jc can be computed in n(n — 1) = O(n?) scalar additions (the server also needs to use 
other operations like comparisons j +Æ i, but they can be eliminated by using loop 
unrolling, and À; and À; can be computed by using table lookups), as follows: 


1+o(1) 


1. For l € 2A do: ue + 0 
2. For i = 1 ton do: 


- Ifa; = 0 then for j = 1 to n do: if j Ai then pata; — Hait; — Ci 
— Else for j = 1 to n do: if j Æ i then py;+a; & Hai+a; + Oj — Ci 


7 New Permutation Argument 


In a permutation argument, the prover aims to convince the verifier that for given per- 
mutation 9 € Sn and two commitments A and B, he knows how to open them as 
A = Com! (ck; a; ra) and B = Com! (ck; b; ra), such that bj = a,j) for j € [n]. We 
assume that @ is a part of the statement. In [15], Groth constructed a permutation argu- 
ment, where the prover’s computation is O(n”) exponentiations and the CRS has O(n?) 
group elements. We now propose a new argument with the CRS of n'+° group ele- 
ments. We also improve the prover’s concrete computation, and the argument is based 
on a (probably) weaker assumption. 

The new permutation argument o([(A,A)]) = [(B, B)], see Prot. 2] uses (al- 
most) the same high-level ideas as the Hadamard product argument from Sect. [6] 
However, the situation is more complicated. Consider the verification equation 

n i n 2o; — Ài 
(gi, be) = elA, g7 = 7 eB, g=) from (13). Letting h = e(g1, g2), 
F(z) := log,, ye = ACH) = bi)x?™e0 +Trad go =y X gret AE 
d äga Disks Tõe tro) — X bi: Dji xi t2AoG) Ài, Following Sect. [6] we re- 
quire that A = AU{2Ak — Ai} U2AU {A+ 2A — A; : i Æ j} and 2- A do not intersect. 
Since o is a part of the statement, we replaced o(i) and o(j) with a new element k. 

Assume that A is a progression-free set of odd integers. Since A consists of odd 
integers, (AU{2A, — Ais })N2-A = O. Since A is a progression-free set, 2AN 2- A = 0. 
However, we also need that 2A,- A 2A, + A; — A; fori A j. That is, one can uniquely 
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represent any non-negative integer a as a = 2A» + Aj. (It is only required that any 
non-negative integer a has at most one representation as a = 2A,» + Aj. See the full 
version.) The unique sequence A = (A;);e¢z+ (the Moser-de Bruijn sequence {23}) that 
satisfies this property is the sequence of all non-negative integers that have only 0 or 1 
as their radix-4 digits. Since An = O(n7), this sequence is not good enough. 
Fortunately, we can overcome this problem as follows. For i € [n] and a permutation 
o, let Tr(a, o) := Hj € [n] : ZAgeay + Az = 2Ao) + Ai}. Note that 1 < Ty(i, a) < n, 
and that for fixed A and o, the whole tuple T4 (o) := (Ta(1, e),...,T a(n, @)) can be 
computed in O(n) simple arithmetic operations. We can then rewrite F, (x) as 


n 


F(x) =X (aga — Tali, 0) + b:) +14 yo -nJ ao- Ary 


i=1 
5 aoli) 5 gòi tea) — 5 b; 5 g^ teo (5) 
i=1 j=1 i=1 j=1 
jżi jżi 
Dhetiy FAJA HZAeC) 
with A being redefined as 
A= AU {2A — Ai} U AU (Ai + 2Ak — AZT AGE\2-A) . (6) 


Since AN2- A = 0, ê(A, D)/é(B, Ey) = ê(g1, Y2) convinces the verifier that aoli) = 
Ta(i, o) - bi for i € [n]. To finish the person argument, we let (A*, A*) to be a 
commitment to (aï, ..., až) := (Ta(o7 *() 0) - a,...,Ta(o*(n), 0) < an), use an 
Hadamard product argument to show that až = T4 (0 -i(i ),@) + a; (and thus až) = 
T(t, 0) < aoti) ) for i € [n], and an argument as described above in this section to show 
that až) = Ta (i, o) - bi for i € [n]. Therefore, apq) = b; for i € [n]. 


Clearly AU A = {0} U A. Since A C {—An +1,...,3An}, then by Thm. [I] 


Lemma 3. For any n there exists a choice of A such that |A| = n'+°), 


We are now ready to state the security of the new permutation argument. The (weaker 
version of) soundness of this argument is based on exactly the same ideas as that of the 
Hadamard product argument. 


Theorem 5. Prot. |2| is perfectly complete and perfectly witness-indistinguishable. If 
Gpp is A-PSDL secure, then a non-uniform PPT adversary has negligible chance of 
outputting inpP®™ < (A, A,B, B,B,o) and an accepting YP°™ + (A*, A*,w%, 
p% %2,2) together with a witness wP®™ <— (a, Ta, b, 79,4", Tar, (F(x eee a> 
(Foe) eed)» s.t. (A, A) = Com! (ck; a; ra), (B, Ê) = Com! (ćkı; b; r»), (B, B) = 
Com (&ki;biro), (AY, AY) = Com (Ruaire) (4Y) = (7, 


D? i fix E flo, es a * = : 
presley, (ye, ge) = (Eea geet len, op = TACI, d) a Gor 
i € [n]), and for some i € [n], agai) F bi. 
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System parameters: Same as in Prot.[I] but let A be as in Eq. (6). 

CRS generation G.1s(1"): Let gk := (p, G1, G2, Gr, ê) < Gop(1"). Let â, a,  < Zp. Let 
gi < Gi \ {1} and g2 — Go \ {1}. Let ĝe gf and ĝe < gf fort € í, 2}. 
Denote ge + gë “ou + GF ” and Gu — JE ‘fort € {1,2} and £ € {0} U A. Let 
(D, D) + (TTL, 92,1 II; G2,a;). The CRS is 


crs + (gk; (gie, Gre, Gre) ee {o}UAs (920) cefo}UAr (920) ceA> (G2e)eea,D, D) . 


Let chi < (gk; (gre, G1e)ee oyu)» ckı + (Bk; (gie, G10) ee {o}UA): 
Common inputs: (A, A,B, Ê, B, o), where ọ € Sn, (A,A) © Com! (cki;a;r2), 
(B,B)< Com! (ck:; b; rp), and (B, B) — Com! (cki; b; re), s.t. bj = aoc) for j € [n]. 
Argument generation Pperm(crs; (A, A, B, B, B, 0), (a, ra, b, ,To)): 
. Let (T*, Î* T3) - , grae (4),@) TI a *(4),2) T T (a), 8), 
. Let ra* + Zp, (A*, Â*) = Com: (dk; Tale 1(1),0) Bini to i t(n), 0) - 
an; Ta* ). Create an argument #™ for [(A, A)] o [(T*, T*, TZ)] = [(A*, A*)]. 
. Let AQ := VA U ({2Agj) + ài — Ay 2 ij € [fn] At A J} \2- A) C {àn + 
Tesco be 
. For£ € aT as in Prot.[] and I2(£) := {(i, j) : i,j € [n] A] FAD gay +A; A 
Ai + 2A o(4) A 2Xo(4) +à; — Àj = L}, set 


Ho, {= 5 a; — 5 bi " 


(i,j) €T (£) (i,j) €12 (£) 


5. Let (Eo, Eo) — Tj: 92,2) (4) —Aa? Tika §2,2d9()—di)- 
6. Let Y? + D- Be” Tee an 9E 0? — Dre By” Tee ay Gah", 
Send wre™ + (A*, Â*, KEE p) € G? x Gé to the verifier as the argument. 
Verification Vperm (crs; (A, A, B, B, B, 0), YP®™): Let Ep and (T*,7*, TŽ) be computed as 
in Pperm. If Y% verifies, e(a", 'D)/&(B, Eo )= (ou, we), ê(A* ES = ê(Â*, g2), and 
êlgi, %2) = ê( ğı, Y? ), then Vperm accepts. Otherwise, Vperm rejects. 


Protocol 2: New permutation argument o([(A, A)]) = [(B, B)] 


Proof. Denote h < €(g1, g2) and F(x) := log,(é(A*, D)/é(B, E,)). WITNESS- 
INDISTINGUISHABILITY: since argument 7?*™ that satisfies the verification equations 
is unique, all witnesses result in the same argument, and therefore the permutation ar- 
gument is witness-indistinguishable. 

PERFECT COMPLETENESS. Completeness of w* follows from the completeness of 
the Hadamard product argument. The third and the fourth verifications are straight- 
forward. For the verification é(A*, D)/é(B,E,) = €é(g1,w?), consider F(x) in 
Eq. ©). Consider X as a formal variable, then the right-hand side (and thus also 
F,(X)) is a formal polynomial of X, spanned by {X4} rc2. au: If the prover is hon- 
est, then b; = aga) for i € [n], and thus F,(X) is spanned by {X°},. 7. Defin- 
ing Ye e (hii 92,0)" + Mi 92,229-as) Tia (= ia Taata ° 
M Mera 92, Aita) PS seh Iei; gaf“, where I3 (i, £) := 
{j € [n] : j FIA a) HAZ A Ai +2Ao(5) }, We see that the second verification holds. 
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WEAKER VERSION OF SOUNDNESS. Assume that A is an adversary that can 
break the last statement of the theorem. We construct an adversary A’ against 
the A-PSDL assumption. Let gk + Gp(1"), £ <— Zp, gi < Gi \ {1}, and 
g2 < Gp \ {1}. The adversary A’ receives crs + (gk; oe wien as her 
input, and her task is to output x. She sets â < Zp, & «+ Zp, and crs’ < (gk; 
(gf, of", of” ee{o}UA; (oF JeefoyuA> CSR 63 Jci: Ii of , Ii a), 
and forwards crs’ to A. Clearly, crs’ has the same distribution as G.,;(1"). Both parties 
also set ckı + (gk; (g? , gÊ? Jee(oyua) and cki + (gk; (gF, 97 Jec(oyua)- 

Assume that A returns (inpP®™, wPer™, wPe'™) such that the conditions in the 
theorem statement hold, and V(crs'; inpPe™, ypPe™m) accepts. Here, inpP™ = 
(A, A, B, B, B, o) and wP®™ = (a, ra, b, 1,4", ras, (Fixe eedr (S(0,0) eed): 

If A is successful, (A,A) = Com!(ckı;a;ra), (B,B) = Com}(ck;b; rp), 
(B, B) = Com!(ck,;b; rp), Y>% verifies, and for some i € [n], Aoi) F Tali, o) > bi. 
Since w”* verifies and the Hadamard product argument is (weakly) sound, we have 
that (A*, A*) commits to (T,(@7! (1), o) - a1, . . ., Ta (071 (n), o) - an). (Otherwise, we 
have broken the PSDL assumption.) Since 2- AN A = 0, A’ has expressed F,(X) as a 
polynomial f(X) where at least for some £ € 2 - A, X“ has a non-zero coefficient. 

On the other hand, A also outputs (S00) eed s.t. Fo(x) = log,, Y = fi (x), where 
all non-zero coefficients of f(X) := ope 4 flo x’ correspond to X° for some £ € A. 
Since A is a progression-free set of odd integers and all elements of 2 - A are distinct, 
then by the discussion in the beginning of Sect. [| 2 ¢ 2 - A. Thus, all coefficients 
of f,(X) corresponding to any X‘, £ € 2 - A, are equal to 0. Thus, f(X)-X*» = 
Dee du(a-A) fox and f=} pg ae are different polynomials with 
f(x) = f(x) = Fo(x). Thus, A’ has succeeded in creating a nonzero polynomial 
d,(X) = f(X): X™ — f,(X), such that do(£) = X ee dex! = 0. 

Next, A’ can use an efficient polynomial factorization algorithm in Z,[X] to 
efficiently compute all < 4\,, + 1 roots of d,(X). For some root y, gr = gi. The 
adversary A’ sets x + y, thus violating the A-PSDL assumption. 


Let A be as described in Thm. [I] The CRS consists of n'+°“) group elements. The 
argument size of Prot. P]is 2 elements from G, and 4 elements from G2. The prover’s 
computational complexity is dominated by O(n”) scalar additions in Zp and by n'+°() 
exponentiations in G2. The verifier’s computational complexity is dominated by 12 
bilinear pairings and 4n — 2 bilinear-group multiplications. 


8 New NIZK Argument for Circuit Satisfiability 


In a NIZK argument for circuit satisfiability (Circuit-SAT, well-known to be an 
NP-complete language), the prover and the verifier share a circuit C’. The prover aims 
to prove in non-interactive zero-knowledge that she knows an assignment of input 
values that makes the circuit output 1. As in [15], the Circuit-SAT argument will use 
the Hadamard product argument, the permutation argument and a trivial argument for 
element-wise sum of two tuples — in our case, all operating in parallel on (2|C| + 1)- 
dimensional tuples, where |C| is the circuit size. Those three arguments can be seen 
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System parameters: Define A and Aas in Prot.[Jand A as in Prot.[2] but in all cases with n 
replaced by 2|C| + 1. Permutation swap. 
CRS generation G.;;(1"): Let all other variables (including the secret ones) be defined as 
in the CRS generation of Prot. [2| but let crs?°'™ be the CRS of Prot. [2| In addition, let 
(D, Do) + (II ĝia: Ty 92,a,). The CRS is crs + (crs?*™, Ô, Dz). Let cki + 
(gk; (gie, Gre, Gre) ee {o}UA): 
Common inputs: A satisfiable circuit C, and permutations 7 and Ç generated based on C, 
such that (L, R, Rn+1, U, X, Xn+1) is a “satisfying assignment”. 
Argument generation P (crs; C, (L, R, Rn41, U, X)): Denote Y := (Yi,...,Y,) for 
Y € {L, R,U, X}. The prover does the following. 
1. Set- ori,...,74 — Zp, and then compute (Ir, Ir, Ir) + 
Com" (cki; L, R, Rns1;r1), Ite + 93 - Tiago, «Tet 924,» (eal) < 
Com!(cki; R, L, Ra+1;r1), (rz, È) + Com!(cki; R,0,...,0,0;r2), (uz, Gz) + 
Com! (éka; U,0,...,0,0;73), (ux, Gx, Gx) + Com’ (ck1; U, X, Xn41; ra). 
. Create an argument 7 for Kr, Ir)] o [(Ir, fr, Ir2ə)} = Kr, Ir)], we for 
swap([(rl, rD)]) = [(Ir, fr, In], Ws for [(rl, rD] o [(D, Ô, D2)] = [(rz, È], Ya for 
[(ux, &)] o [(D, D, D2)/(91,An> ĝt,An > I1,àn )] = [(u2, @2)/(91,An ĝi An 91,àn l; 
ws for [(rz, fZ)] © [(Ir, Ir, Ir2)] = [(D, Ô) - (uz7!, 75], ve for r({(Ir, Ir)]) = 
[(ir, Ir, Ir)], and %7 for ¢~*([(ux, Gx)J) = [(Ir, Ir, Ir)]. 
3. Send w + (Ir, ir, Ir, Ira, rl, rl, rz, FZ, uz, OZ, ux, OX, ux, W1,..., Y7) to the verifier. 
Verification V (crs; C, Y): The verifier does the following: 
— For A € {Ir, rz, uz, ux} check that ê( A, g2) = é(A, g2). 
— Check that ê(g1, Ir2) = é(Ir, g2). _ 
For A € {Ir, rl, ux} check that ê( A, g2) = ê(A, g2). 
Verify all 7 arguments 41, . . . , Y7 with corresponding inputs. 


Protocol 3: New NIZK argument for Circuit-SAT 


as basic operations in an NIZK “programming language” for all languages in NP. We 
show that a small constant number of such basic operations is sufficient for Circuit-SAT. 
The full argument then contains additional cryptographic sugar: a precise definition of 
the used CRS, computational/communication optimizations, etc. 

The first task is to express the underlying argument as a parallel composition of 
some addition, permutation and Hadamard product arguments. These arguments may 
include intermediate variables (that will be committed to by the prover) and constants 
(that can be online committed to by both of the parties separately). When choosing the 
arguments, one has to keep in mind that we work in an asymmetric setting. This may 
mean that for some of the inputs to the circuit satisfiability argument, one has to commit 
to them both in G; and Gg (and the verifier has to check that this is done correctly). 

The CRS is basically the CRS of the permutation argument. The total argument con- 
sists of commitments to intermediate variables and of all arguments in the program of 
this “programming language”. Finally, the verifier has to check that all commitments 
are internally consistent, and then verify all used arguments. 

Let us now turn to the concrete case of circuit satisfiability. For the sake of simplicity, 
assume that the circuit C is only composed of NAND gates. Let C have n gates. Assume 
that the output gate of the circuit is n, and U, is the output of the circuit. For every gate 
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j € [n] of C, let the input wires of its jth gate be L; and R,, and let U; be one of its out- 
put wires. We also define an extra value R,+, = 1. We let X; be other “output” wires 
that correspond to some Ly or Ry, that were not already covered by U;, (that is, inputs to 
the circuit, or duplicates of output wires). That is, (U1, . . . , Un, X1,---,Xn+41) is cho- 
sen so that for some permutation Ç, (U , X , Xn+1) is a ¢-permutation of (L, R, Ry+1), 
where Y = (Yj,..., Yn) for Y € {L, R,U, X}. 

More precisely, the prover and the verifier share the following three permutations, the 
first two of which completely describe the circuit C. First, 7 € S2n+1 1s a permutation, 
such that for any values L;,,..., Lis, Rj,,..., Rj, that correspond to the same wire, 7 
contains a cycle i; > ig 9 ++: Gis > jı +n >- > fe +n — i. For unique 
wires i, T(i) = i. Second, Ç € S2n+1 is a permutation that for every input wire (either 
L; or Ri—n), outputs an index j + Ç(i), such that the output wire U; or X;_p is equal 
to that input wire. Third, swap € S2n+1 is a permutation, with swap(i) = i + n and 
swap(i + n) = i for i € [n], and swap(2n + 1) = 2n + 1. Note that swap = swap '. 

The argument is given by Prot. [3| In every subargument used in Prot.|3} the prover 
and the verifier use a substring of crs as the CRS. The corresponding substrings are 
easy to compute, and in what follows, we do not mention this issue. Instead of com- 


puting two different commitments Com! (cky: a;r) = (gf Igo 2E - [1 7,) and 


Com! (cks; a;r) = (gi TL95. gE 119r.) we sometimes compute a composed com- 
mitment Com’ (ckz; a; r) = (9f - IgA, 97 TL 9¢,,9¢ TI 97'\,). We assume that the 
same value â is used when creating product arguments and permutation arguments. 


Theorem 6. Let Gpp be A-PSDL secure, and A-PKE secure in both G, and Go. Then 
Prot. Blis a perfectly complete, computationally adaptively sound and perfectly zero- 
knowledge non-interactive Circuit-SAT argument. 


Proof. PERFECT COMPLETENESS: follows from the perfect completeness of the 
Hadamard product and permutation arguments. 

ADAPTIVE COMPUTATIONAL SOUNDNESS: Let A be a non-uniform PPT adver- 
sary that creates a circuit C and an accepting NIZK argument %. By the A-PKE as- 
sumption, there exists a non-uniform PPT extractor X4 that, running on the same 
input and seeing A’s random tape, extracts all openings. From the (weaker version 
of) soundness of the product and permutation arguments and by the A-PSDL assump- 
tion, it follows that the corresponding relations are satisfied between the opened val- 
ues. Moreover, by the A-PSDL assumption, the opened values belong to corresponding 
sets A and A. Let (L, R, Rn+1) be the opening of (|r, Ir), where L = (Ly,..., Ln) 
and R = (R,..., Rn), and let ((i,...,Un, X1,- --, Xn, Xn41) be the opening of 
(ux, Ux). We now analyze the effect of every subargument in Prot. B] 

The successful verification of é(g1, lr2) = ê(lr, g2) shows that Irz is correctly formed. 
The first argument Yı shows that L;, R; € {0,1}. The second argument pz shows that 
(rl, rl) commits to (R, L, Ry41). The third argument y3 shows that (rz, f) commits to 
(R,0,...,0,0) and is thus consistent with the opening of (Ir, Ir). The fourth argument 
wa shows that (uz, úz) commits to (U1, . . . , Un—1, U/,,0,...,0,0) for some U/.. It also 
shows that U,, -0 = U!, — 1, and thus U/, = 1. (The value of U,, is not important to get 
soundness, since it is not used in any other argument.) 
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The fifth argument shows ws that the NAND gates are followed. That is, =(L; A 
Ri) = U; for i € [n — 1]. It also shows that the circuit outputs 1. Really, since (uz, Uz) 
commits to (U;,...,Un—1, U}, = 1,0,...,0,0), then (D, D) (uz, uz) commits 
to (1—Uy,...,1—Un_1,1—1 = 0,0, ... , 0,0). Thus, the Hadamard product argument 
verifies only if L;-R; = 1—U; for i € [n—1], and L,,-R, = 0, thatis, =(L,AR,) = 1. 

The sixth argument ys shows that if 71,...,%5,91 +7,..-,J¢ + n correspond to the 
same wire, then Li, -e = Lh, = Rj --- = Rj, that is, the values are internally 
consistent with the wires. The seventh argument Yy shows that the “input wires” and 
“output” wires are consistent. 

PERFECT ZERO-KNOWLEDGE: we construct the next simulator S = (S1, S2). The 
simulator Sı (1%, n) creates a correctly formed CRS together with a simulation trapdoor 
td = (â, ã&, x) € Z3. The adversary then outputs a statement C (a circuit) together with 


a witness (a satisfying assignment) w. The simulator S2(crs; C, td) creates (Ir, Ir, Ir, Ir ), 
(rl, rl), (rz, f), (uz, Gz) and (ux, Gx) as commitments to (0,...,0). Due to the knowl- 
edge of trapdoor td, the simulator can simulate all product and permutation arguments. 
More precisely, he uses L; = R; Ui U!, 1 to simulate all product and per- 
mutation arguments, except in the case of Ys where he uses U; = U’, = 0 instead. 
(Obviously, (rz, fz) and (uz, úz) commit to consistent tuples.) 

To show that this argument 7” simulates the real argument Y, note that y is per- 
fectly indistinguishable from the simulated NIZK argument 7)’ where one makes trap- 
door commitments but opens them to real witnesses L;, R; when making product and 
permutation arguments. On the other hand, also y’ and w” are perfectly indistinguish- 
able, and thus so are w and y”. 


Let A be chosen as in Thm. [I] The CRS consists of |C|'+°“) group elements. The 
communication (argument length) of the argument in Prot. B]is 18 elements from G1 
and 21 elements from G2. The prover’s computational complexity is dominated by 
Q(|C|?) simple arithmetical operations in Zp and |C|+°“) exponentiations in G. The 
verifier’s computational complexity is dominated by 72 bilinear pairings and 8|C| + 8 
bilinear-group multiplications. 

Moreover, the CRS depends on AU A. Since 0 may or may not belong to A (this 
depends on the choice of A) and AU 25A C A, AU A = {0} U A. Recalling that 
elements of G, can be represented by 512 bits and elements of Gz can be represented 
by 256 bits, the communication (argument length) is 18 - 512 + 21-256 = 14592 bits. 
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Point Obfuscation and 3-Round 
Zero-Knowledge* 


Nir Bitansky and Omer Paneth 


Tel Aviv University, Boston University 


Abstract. We construct 3-round proofs and arguments with negligible 
soundness error satisfying two relaxed notions of zero-knowledge (ZK): 
weak ZK and witness hiding (WH). At the heart of our constructions lie 
new techniques based on point obfuscation with auxiliary input (AIPO). 

It is known that such protocols cannot be proven secure using black- 
box reductions (or simulation). Our constructions circumvent these lower 
bounds, utilizing AIPO (and extensions) as the “non-black-box compo- 
nent” in the security reduction. 


1 Introduction 


Interactive proofs and arguments are fundamental notions in 
the theory of computation. In cryptography, these are typically used to prove 
NP-statements and the proof is required to maintain the prover’s privacy. Dif- 
ferent notions of privacy were considered, the most comprehensive one being 
zero-knowledge (ZK). ZK protocols allow proving an assertion without revealing 
anything but its validity. That is, the information learned by the verifier from 
the interaction can be simulated only from the (valid) statement itself. 

Since ZK was introduced [GMR85], questions regarding the round complexity 
of ZK protocols were studied extensively. While it is known that 2-round ZK 
protocols (with auxiliary input) do not exist for languages outside BPP [GO94], 
a classical open question is whether there exist 3-round ZK protocols for NP 
with negligible soundness error. The difficulty of this problem is expressed by 
the lower bound of [GK96]: there do not exist 3-round black-box ZK (BBZK) 
protocols with negligible soundness for languages outside BPP. Namely, to prove 
that a 3-round protocol is ZK, one must demonstrate a simulator that uses the 
verifier in a non-black-box way. 

The work of shows that using non-black-box simulation it is possible 
to go beyond existing black-box bounds. However, so far we do not know how 
to use similar techniques to obtain 3-round ZK protocols. Nevertheless, 3-round 
ZK protocols have been constructed based on non-standard “knowledge assump- 
tions”. show a 3-round ZK argument based on the knowledge of 
exponent assumption (KEA) and variants of it. A different “knowledge assump- 
tion” was used to show the existence of 3-round ZK proofs for NP [LMOI1]. (See 
further discussion in Section [1.2}) 


* This research was funded by the Check Point Institute for Information Security, by 
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In light of the difficulties in achieving 3-round ZK, it is natural to examine 
relaxations of ZK that might enable the construction of such protocols. We 
discuss several previously studied relaxations. 


Witness indistinguishability (WI). A protocol is WI if any two proofs 
for the same statement that use two different witnesses are indistinguishable. 
show that, while the parallel repetition of basic (3-round) ZK protocols is 
not BBZK, it is WI. Furthermore, the soundness error decreases exponentially 
in the number of repetitions. However, WI protocols do not always guarantee 
witness secrecy; in particular, for statements with a unique NP-witness, WI is 
meaningless. Nevertheless, show how to use WI to achieve other notions 
of secrecy such as ZK and witness-hiding. 


Witness hiding (WH). Roughly speaking, a protocol is WH with respect 
to a distribution D on an NP-language £ if no verifier can extract a witness 
from its interaction with the honest prover on a common instance x + D. For 
WH to be meaningful, it should be restricted to hard distributions, namely, 
distributions D for which poly-size circuits cannot find a witness w € Re(x) 
for instances x + D. WH is in a sense a “minimal” notion of privacy; indeed, 
leaking the entire witness does not leave much room for imagination. 

present a 3-round protocol with negligible soundness error that is only 
WH with respect to a specific type of (hard) distributions on languages, where 
every instance has two witnesses. In contrast, extending the lower bounds of 
[GK96], the work of show that, for distributions with unique witnesses, 
3-round WH cannot be ” black-box reduced” to any standard cryptographic as- 
sumption” (e.g., existence of OWFs), given natural limitations on the reduction. 

In this work, we are interested in protocols that are WH with respect to all 
hard distributions (including the unique witness case). We remark that con- 
structing WH protocols for restricted classes of distributions, where a lower 
bound on their hardness is apriori known, is a relatively easy task (and is not 
ruled out by [HRS09]). Indeed, using super-polynomial black-box reductions, it is 
possible to obtain 3-round WH protocols with respect to super-polynomial hard 
distributions. (For example, f(n) = w(logn) parallel repetitions of any 3-round 
ZK protocol with constant soundness error is WH with respect to distributions 
that are hard for 2/(”)-size adversaries.) Typical cryptographic scenarios, how- 
ever, do call for secrecy with respect to general languages/distributions where 
no apriori super-poly hardness bound is known at the protocol’s design time. 
Here, efficient reductions requiring non-black-box techniques are needed. 


Weak zero-knowledge (WZK). The standard notion of ZK requires that for any 
(potentially adversarial) verifier there exist a simulator that simulates its view 
in an interaction with the honest prover. The simulated view should be indistin- 
guishable from the real one by any (efficient) distinguisher. The notion of WZK 
relaxes ZK by changing the order of quantifiers. Specifically, it allows 
the ZK simulator to depend on the particular distinguisher in question. 
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While ZK is often used as a sub-protocol in larger systems, WZK is not always 
suitable for this purpose due to its weaker simulation guarantee. In particular, 
WZK is not known to be closed under sequential repetition. Nevertheless, WZK 
is useful in settings where the verifier tries to learn a specific type of information 
and we can present a distinguisher that can test whether the verifier succeeded in 
learning it. Examples include verifiers that try to lean a specific predicate of the 
witness, or any function of the witness that is efficiently verifiable. In particular, 
WZK implies WH (by considering a distinguisher that tests if the verifier’s view 
contains a valid witness). We note that, for black-box simulation, WZK and 
(standard) ZK coincide; hence, by [GK96], a 3-round protocol with negligible 
soundness error cannot even be shown to be WZK with black-box simulation. 

To sum up the above discussion, 3-round arguments with negligible sound- 
ness error that are ZK, WH or WZK cannot be constructed using black-box 
techniques. (From this point on, we only consider proofs/arguments with neg- 
ligible soundness error). In light of the existing non-black-box constructions, it 
is interesting to investigate which techniques and assumptions could suffice for 
constructing such protocols. Another interesting related question is understand- 
ing whether the relaxed notions of WH and WZK require simpler techniques 
than for full-fledged ZK; indeed, all existing WH constructions are based on the 
stronger notion of ZK as a building block. The question of finding “more direct” 
constructions of WH was already raised by [FS90]. This work sheds new light 
on both questions, introducing techniques based on point obfuscation (PO). We 
next briefly review the concept of PO. 


Point obfuscation and extensions. Informally, an obfuscator is a randomized 
algorithm O that gets as input a program C (given by a circuit) and outputs 
a new program O(C) that has the same functionality as the original one, but 
does not leak any additional information on C [BGIt Ol). A stronger variant is 
obfuscation with auxiliary input, in which O(C) does not leak any information 
even given a related auxiliary input zc [GK05]. 

In this work, we consider obfuscation of point circuits and their extensions. A 
point circuit Is outputs 1 on s and L on all other inputs. A multibit point circuit 
I,54 outputs t on s and L otherwise. We also consider a new extension of point 
circuits which we call circular point circuits - these are circuits Is=¢ which output 
ton input s, s on input t, and L otherwise. Obfuscators for multibit point circuits 
are called Digital Lockers (DL). We introduce the new notion of circular digital 
lockers (CDL) that are obfuscators for circular point circuits. Point circuits and 
their extensions are among the very few functionalities for which obfuscators have 
been shown (albeit, typically, under rather strong hardness assumptions.) So far, 
however, POs have found only a handful of applications in cryptographic theory, 


mostly to strong forms of encryption CD08} [CKV W10] [BC10]. 


1.1 Our Contribution 


We construct 3-round WH and WZK protocols based on two different variants 
of point obfuscation: 
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— 3-round negligible soundness WH proofs for NP, given auxiliary input point 
obfuscators that satisfy a relatively mild distributive security requirement. 
The protocol is WH with respect to general hard distributions (including 
the unique witness case). 

— 3-round WZK arguments for NP, given auxiliary input digital lockers that 
satisfy a worst-case simulation security requirement. 


We next give an overview of our constructions, followed by a discussion on the 
nature of our obfuscation assumptions and how they relate to previous assump- 
tions used for 3-round ZK protocols. 


3-round witness-hiding. The high level idea behind our WH protocol is as fol- 
lows. Given an NP statement x € £, have the verifier V construct a modified 
NP verification circuit Ver% „ that on a valid witness w € Re(x) outputs a 
secret random point y and outputs L otherwise. V then “garbles” this circuit 
using Yao’s technique and both parties execute a 2-message oblivious-transfer 
protocol, at the end of which the prover P possesses the garbled circuit and 
the corresponding labels for the witness w. Next, P evaluates the circuit (on w) 
and obtains the point y. (This is essentially a conditional disclosure of secrets 
protocol, as termed by [ATROI], where P learns the output y only 
if it inputs a valid witness.) In the third message, P sends back to V a point 
obfuscation of y. V accepts only after verifying it got a valid obfuscation of y. 

Informally, soundness follows from the secrecy of the garbled circuit that 
prevents a dishonest prover from obtaining the random y in case there is no 
valid witness. In fact, we show that our protocol is a proof of knowledge. 

The witness-hiding property is based on the security of the underlying obfus- 
cator. To exemplify, consider a version of the protocol where P sends back y in 
the clear. Following is an attack on this simple version of the protocol. Consider a 
cheating verifier V* that, instead of garbling Verh os garbles the identity circuit. 
P now evaluates the garbled circuit on w and obtains the point y = w. If P was 
to simply send back y in the clear, V* would have learned w and the protocol 
would be completely insecure. Instead, P sends back an obfuscation O(y). The 
security of the obfuscator O should then assure that V* cannot obtain w, unless 
“it was already known” to V* in advance. 


The security reduction and required obfuscation assumptions. As we have seen, 
the WH guarantee of our protocol depends on the security of the underlying point 
obfuscator O. We now discuss the properties of the obfuscation used to show 
WH. Concretely, our underlying obfuscator should satisfy a distributional indis- 
tinguishability requirement with respect to points and related auxiliary infor- 
mation that are jointly sampled from an unpredictable distribution. We say that 
a distribution ensemble D = {(Zn, Yn) }nen On pairs of strings is unpredictable 
(UPD) if poly-size circuits cannot predict (with noticeable chance) the point Yn, 
given the potentially related auxiliary input Z,. We say that O is a distribu- 
tional auxiliary input point obfuscator (AIPO) if, for any UPD D = {(Z,,,Yn)}, 
no poly-size circuit family can distinguish, given Z,, an obfuscation of O(Yn) 
from an obfuscation of a random point O(U,,). 
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In our setting, Zn represents the common input x and the prover’s first mes- 
sage (during the OT protocol). Y» is the obfuscated point (returned by the 
honest prover). That is, Zn is explicitly known to the verifier, while Y,, is obfus- 
cated. A malicious V* might choose its (garbled) circuit to output illegitimate 
information on the witness (i.e., information it could not predict on its own only 
from Z,,); the obfuscation, however, should prevent it from doing so. 


3-round weak zero-knowledge. The WH protocol described above is not ZK - it 
enables a cheating verifier V* to learn arbitrary predicates of the witness. For 
example, to learn w1, the first bit of w, V* can maliciously choose its garbled 
circuit to map w to one of two arbitrary points yo, yı according to w1. In this 
case, the honest prover sends an obfuscation O(y,,), and V* learns w, by simply 
running the obfuscation on each of the two points yo, y1. (This can be generalized 
to any function f(w) where |f(w)| = O(logn), using a poly-size set {y;}). 

Towards making the protocol ZK, we try to cope with the above attack by 
requiring that the verifier “proves” it “fully knows” the secret point y (rather 
than just a poly-size set containing y). To achieve this without adding rounds, we 
ask that the verifier itself includes an obfuscation of y in its message. The prover 
then checks the obfuscation’s consistency with the point extracted from the 
circuit evaluation. In case of inconsistency, the prover aborts. This modification, 
however, still does not prevent the above attack. The verifier V* can learn wy 
by sending an obfuscation of the string yo and observing whether the prover 
aborts. Moreover, the protocol may no longer be sound since a cheating prover 
might use the verifier’s obfuscation to create an obfuscation of the same point y 
without “knowing” y. 

We resolve these issues as follows: (a) to regain soundness, we use an obfus- 
cation scheme with non-malleability properties, based on an obfuscated circular 
point circuit. (b) to achieve WZK, we require that, instead of a plain point 
obfuscation, the verifier sends an obfuscated multibit point circuit that on the 
secret input y outputs the coins used by the verifier to garble the circuit. Now, 
the prover can verify that the garbled circuit is indeed Verh » (for some y). 

In order to show that the protocol is WZK, we use stronger notions of ob- 
fuscation. Since WZK requires worst-case simulation (i.e., simulation for any 
x), we require that our obfuscators also satisfy a worst-case simulation guaran- 
tee (rather than the weaker distributive definition used for WH). To simulate 
any verifier V*, our simulator must make use of the obfuscation simulator for 
y*. However, an obfuscation simulator for general adversaries with long output 
could not exist (see [BGI*01]); in fact, known constructions of PO only address 
simulation of adversaries with a single output bit. To overcome this, we use the 
fact that the WZK simulator is given a specific distinguisher D, and the simu- 
lated verifier view should only needs to fool this specific D. We show how to use 
an obfuscation simulator for the binary adversary D(V*), which is the composi- 
tion of the distinguisher and the verifier, in order to construct a WZK simulator. 
Indeed, this limitation on simulating adversaries with long output is the reason 
we do not achieve full-fledged ZK. 
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1.2 Reflections on the Use of Point Obfuscation 


The results of imply that our 3-round protocols cannot be shown 
secure using reductions that only make black-box use of the adversary. This is 
not surprising: indeed, neither auxiliary input nor standard point obfuscators 
can be shown to be secure using black-box reductions [(Wee05|. Hence, our use 
of obfuscation inherently implies that the verifier is not used as a black-box. 

To demonstrate the non-black-box nature of POs, we briefly review the tech- 
niques used in existing constructions [Wee05]. We can view POs as a 
special case of AIPOs, where the auxiliary input Zn is empty. In this case, Yn 
is unpredictable if it is well-spread (i.e., has super-logarithmic min-entropy) and 
the security requirement is that O(Y,) ~e O(Un) for any well-spread Y,,. 

The hardness assumptions made in are shown to imply that 
the strategy of any distinguisher essentially consists of a poly-size set of “dis- 
tinguishing elements”. That is, only obfuscations of points within this set are 
distinguishable from an obfuscation of a random point. However, these elements 
cannot be extracted using black-box access to the adversary. Hence, they are 
given to the reduction (or simulator) as non-uniform advice. 

These techniques allow achieving the stronger worst-case simulation defini- 
tion, thus showing that the distributive and worst-case definitions are in fact 
equivalent in the case of no auxiliary input. When considering auxiliary input, 
we can no longer apply these techniques. Indeed, the set of distinguishing el- 
ements can now depend on the auxiliary input in an arbitrary way. That is, 
no short advice suffices for the reduction to go through. In general, we do not 
know whether the distributive AIPO definition implies the worst-case simulation 
definition in the auxiliary input case (the converse still holds). 


Concrete constructions. There exist very few constructions that were shown to 
be secure with respect to auxiliary input. show that any point obfuscator 
is also secure with respect to auxiliary input that is chosen independently of the 
obfuscated point. suggest a construction that, under a variant of the 
LWE assumption, satisfies a restricted definition where the distribution D is 
“highly unpredictable”. Both results are insufficient for our needs. 

In this work, we consider two concrete constructions of AIPOs based on two 
different assumptions. The first AIPO, known as the (r,r”) obfuscator, was sug- 
gested by Canetti based on a strong variant of DDH. Informally, the 
assumption states that there exists an ensemble of prime order groups G = 
{Gn : |G,| = pr} such that for any unpredictable distribution D = (Zn, Yn) with 
support {0,1}P°v%™) x Zp,? (2,7, r”) &e(z,7, r“), where (z,y) + (Zn, Yn), u = 
Zp, and r is a random generator of G, UL. 

For the second construction, we suggest a new assumption that is stated in 
terms of uninvertibility rather than indistinguishability. The assumption strength- 
ens the assumption made by Wee to account for auxiliary inputs. Roughly, 
to construct (non auxiliary input) POs, Wee assumes a strong one-way 


Both DKLO9}, make use of a slightly different formulation for the distribu- 
tional AIPO requirement. Their formulation is essentially equivalent to ours. 
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permutation f that is “uninvertible” with respect to all well-spread distributions. 
A natural extension of the latter to the auxiliary input setting is to assume that 
the permutation is hard to invert, even given side information Z on the pre-image 
Y, from which Y cannot be predicted. An additional fact used by Wee is that 
permutations inherently preserve (information-theoretic) entropy; in particular, 
if Y is well-spread, so is f(Y). In the (computational) auxiliary input setting, 
this might not be true; namely, it might be that Y is unpredictable from Z, 
while f(Y) is predictable from Z. One possible way to deal with this issue is to 
assume a trapdoor permutation family (with the above strong uninvertibility). 
Further details can be found in the full version of this paper : 

We remark that both the assumptions we consider (or any assumption that 
states that a specific obfuscation candidate is an AIPO satisfying either a the 
worst-case or the distributive security definition) are considered to be non- 
standard. In particular, any such assumption is non-falsifiable in the terms of 


Naor . 


Comparison with previous work on 3-round ZK. As already mentioned, it is 
known how to construct 3-round ZK arguments and proofs using non-falsifiable 
“knowledge assumptions,” such as the knowledge of exponent assumption (KEA) 
[BP04], the POK assumption [LMOI], or the existence of “extractable 
perfect one-way functions” (EPOWF) {CD09}. 

The KEA assumption [Dam91], essentially asserts that any algorithm that 
produces a DDH tuple, must “know” the corresponding exponents. Upon the 
formulation of KEA, raised a more general question regarding the ex- 
istence of “sparse range one-way functions”, such that any algorithm that can 
sample an element within the function’s image, must also “know” a primage 
(KEA indeed yields such a OWF). The EPOWF primitive of formalizes 
this generalization. All in all, all the above assumptions essentially fall under the 
abstract notion of EPOWF. (Indeed, show that either one of the KEA 
or the POK assumptions imply the EPOWF primitive, when combined with a 
hardness assumption such as DDH.) 

In this work, we show how to circumvent the black-box impossibility results for 
3-round WZK and WH based on a different set of primitives; namely, (variants 
of) point obfuscation with auxiliary input. Currently, we do not know of any 
formal relation between the AIPO and EPOWF primitives, beyond the relation 
established in this work (through 3-round ZK). Formalizing such a relation is an 
interesting question on its own (going beyond the scope of 3-round ZK). 


On the efficiency of the construction. Basing our constructions on (Yao-based) 
secure function evaluation results in efficient protocols with a practical imple- 
mentation (similarly to [IKOS07]). By working directly with the verification 
circuit Verc, we avoid the overhead of Karp reducions, existing in most ZK pro- 
tocols. Specifically, we can achieve communication complexity O(ns), where n 
is the security parameter and s is the size of Verg. This is not optimal as there 
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exist ZK arguments with polylog communication complexity [Kil92|. However, 
these require using PCPs, making them less practical. 


Finally, we consider the techniques in use. Unlike previous works, our work 
demonstrates a direct WH construction that is not based on a ZK protocol. We 
then strengthen it to a limited form of ZK. Our WH to WZK transformation is 
specifically tailored for our construction. An interesting open question is whether 
a general transformation of this type exists. 


Organization. In Section] we present the main definitions and tools used in this 
work. In Section B]and Section[4] we introduce our WH and WZK protocols. For 
lack of space many of the details and proof are omitted and can be found in the 
full version of this paper [BPI]. 


2 Definitions and Tools 


2.1 Weak Zero-Knowledge and Witness Hiding 


In this work, we discuss two relaxations of ZK which are formalized next. 


Weak zero-knowledge. In ZK, we require that the view of any verifier V*, in an 
interaction with the honest prover P, can be simulated by an efficient simulator 
S. The simulated view should be indistinguishable from the view of V* for any 
poly-size distinguisher. In weak ZK (WZK), the simulator is only required to 
output a view that is indistinguishable from that of V* for a specific distinguisher. 
This is modeled by supplying the simulator with the distinguisher circuit as 
additional auxiliary input. 


Definition 2.1 (Weak zero-knowledge). An argument system (P, V) is WZK 
if for every PPT verifier V* there exist a PPT simulator S such that for every 
poly-size circuit family of distinguishers D = {Dn} en and any x € LM {0,1}", 
w E Re(x), z € {0,1}°°%™ it holds that: 


[Pr[Dn((P(w), V* (z))(@)) = 1] — Pr[Dn (S (Dn, x, z)) = 1]| < negl(n) . 


Witness-hiding. A protocol is WH if the verifier cannot fully learn a witness from 
its interaction with P. This requirement is restricted to instances and witnesses 
(a,w) sampled from “hard distributions”. 


Definition 2.2 (Hard distribution). Let D = {Dn},,cn be an efficiently sam- 
plable distribution ensemble on Re, i.e., the support of Dn is Supp(D,) = 
{(x,w):x E€ LN {0,1}",w E Re(x)}. We say that D is hard if for any poly-size 
circuit family {Cn} and sufficiently large n it holds that: 


Pr [Cr(x) E Re(x)] < negl(n) . 


(ww)? Re 
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Definition 2.3 (D-witness-hiding). An argument system (P,V) for an NP 
language L is WH with respect to a hard distribution D = {Dn}, en, tf for any 
poly-size verifier V* and all large enough n € N: 


Pr [(P(w), V*)(a) E Re(a)| < negl(n) . 


(x,w)— Dn 
We say that (P, V) is WH if it is D-WH for every hard distribution D. 


As discussed in the introduction, in this work we will be interested in WH 
protocols (with respect to a every hard distribution), and not with protocols 
that are WH with respect to a specific hard distribution. 


2.2 2-Message Delegation 


A central tool used in our constructions is a 2-message delegation protocol in 
which the prover and verifier jointly evaluate the NP verification circuit of the 
language on the common instance and the prover’s witness. We use this primitive 
(following the formulation in [IP07]) to abstract the use of the Yao’s garbled 
circuit construction. 

A 2-message delegation protocol is executed by parties (A, B) where A has 
an input x, and B has as input a function f (given by a boolean circuit). The 
protocol should allow A to obtain f(a) using two messages: A + B —> A, without 
compromising the input secrecy of either party. We additionally require that, 
given B’s message and secret randomness, one can reconstruct f. The protocol 
is defined by a tuple of algorithms (Gen, Enc, Eval, Dec, Open) and proceeds as 
follows: 


A: Obtains a key sk + Gen(1"), computes an encryption of its input c + 
Enc(sk, x), and sends c. 

B: Computes an encrypted output é < Eval(c, f) using randomness r, and sends 
back ĉ. 

A: Outputs y = Dec(sk, é). 


We briefly describe the security properties required from 2-message delegation 
schemes in this work: 


— Correctness: When both parties are honest A outputs f(x). 

— Input Hiding: An adversarial B cannot learn A’s input x (in the semantic 
security sense). 

— Function Hiding: An adversarial A learns nothing about B’s input f, other 
than the value of f(a) (security in this case is simulation based). 

— Function Binding: In a later stage, B can reveal its input function f by 
exhibiting its random coins. We require that for any message sent by B, it 
can reveal at most one function. While function-binding is not required in 
common formulations of delegation protocols, we show that a Yao-based con- 
struction (when instantiated with natural forms encryption) has this prop- 
erty. 
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In the full version of this paper [BPIH], we provide a formal definition of secure 2- 
message delegation and describe a concrete instantiation based on Yao’s garbled 
circuit technique and 2-message OT. We also define an information-theoretic 
version of this primitive, which we use in order achieve a WH protocol with 
unconditional soundness (i.e., a proof). 


2.3 Point Obfuscation with Auxiliary Input 


We start by recalling the standard definition for circuit obfuscation with auxiliary 
input. The definition is a worst-case definition, in the sense that simulation must 
hold for any circuit in the family and any related auxiliary input. 


Definition 2.4 (Worst-case obfuscator with auxiliary input [BGI‘01) 
GK05]). A PPT O is an obfuscator with auxiliary input for an ensemble C = 
{Cr}nen of families of poly-size circuits if it satisfies: 


— Functionality. For any n € N, C € Cn, O(C) is a circuit that computes the 
same function as C. 

— Polynomial slowdown. For any n € N, C € Cn, |O(C)| < poly(|C}). 

— Virtual black box. For any PPT adversary A there is a PPT simulator S 
such that for all sufficiently large n € N,C € Cn and z € {0,1}?™; 


Pr[ A(z, O(C)) = 1] — Pr[S@(z, 1!C!) = 1]] < negl (n) , 


where the probability is taken over the coins of A,S and O. 


An obfuscator O is recognizable if given a program C and an alleged obfuscation 
of C, C, it is easy to verify that C and C compute the same function. 


— Recognizability. There exist a polynomial time recognition algorithm V 
such that for any C € Cy: 
~ Pro [V(C,O(C)) = 1] =1 
— For any Č € {0,1}? if V(C,C) =1 then C and C compute the same 
function. 


Point obfuscation. We consider obfuscation of point circuits and their extensions. 
A point circuit I, outputs 1 on string s and L on all other inputs. 


Definition 2.5 (Worst-Case auxiliary-input point obfuscation (AIPO)). 
A PPT algorithm O is a worst-case AIPO if it is a recognizable obfuscator (accord- 
ing to Definition[2.4) for the circuit ensemble: C = {Cn = {Is|s € {0,1}"}} nen: 


Remark 2.1. The notion of recognizable obfuscation was not explicitly defined 
in previous works. We only consider this property in the context of point obfus- 
cation. While, in general, point obfuscators are not required to be recognizable, 
previously constructed obfuscators are trivially recognizable. 
This is due to the fact that they use public randomness, i.e., the randomness 
used by the obfuscator appears in the clear as part of the obfuscated circuit. 
The recognition algorithm, given a program and its obfuscation, can simply 
rerun the obfuscation algorithm with the public randomness and compare the 
result to the obfuscation in hand. 
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We next present a weaker distributional definition for point obfuscation with 
auxiliary input that previously appeared in [Can97] (in a slightly different for- 
mulation). We first give a preliminary definition of unpredictable distributions 
(generalizing Definition [2.2) and then present the obfuscation definition. 


Definition 2.6 (Unpredictable distribution). A distribution ensemble D = 
{Dn = (Zn, Yn)}nen: On pairs of strings is unpredictable if no poly-size circuit 
family can predict Yn from Zn. That is, for every poly-size circuit family {Cn} 
and for all large enough n: 


nen 


Pr [Cy(z) = y] < negl(n) . 
(2,y)—Dn 

Definition 2.7 (Auxiliary input point obfuscation for unpredictable 
distributions (AIPO)). A PPT algorithm O is a point obfuscator for un- 
predictable distributions if it satisfies the functionality and polynomial slowdown 
requirements as in Definition[2.4, and the following secrecy property. For any 
unpredictable distribution D = {Dn = (Zn, Yn)} over {0,1}P°%™ x {0,1}” it 
holds that: 

{z,O(y): (zy) = Dr} nen mre {z,O(u) 1:2 Zy,u g {0, Pea . 
Remark 2.2. Using this definition in our WH construction, we can settle for a 
slightly relaxed definition with bounded auziliary input; namely |Yn| = w(|Zn]). 
We do not know if such a bounded form of auxiliary-input indeed weakens the 
requirement. However, it does seem to withstand certain “diagonalization at- 
tacks” that can be performed for the non-restrictive (under certain obfuscation 
assumptions). 


2.4 Digital Lockers and Circular Digital Lockers 


We also consider obfuscation of several extensions of point circuits. Specifically, 
multibit point circuits and circular point circuits. A multibit point circuit Is 
outputs ¢ on s and L otherwise. A circular Point circuit I, outputs t on input 
s, s on input t, and | otherwise. Obfuscators satisfying the worst-case AIPO 
definition (Definition 2.5) for multibit point circuits and circular point circuits 
are called digital lockers (DLs) and circular digital lockers (CDLs). 


Definition 2.8 (Digital locker (DL)). A PPT algorithm is a DL if it is a 
recognizable obfuscator (according to Definition [2.4) for the circuit ensemble: 
C= {Cn = {Is1|,t € {0, 1}"}} nen: 


Definition 2.9 (Circular digital locker (CDL)). A PPT algorithm is a CDL 
if it a recognizable obfuscator (according to Definition[2.4) for the circuit ensem- 
ble: C = {Cn = {Isst|s,t € {0,1} Hren: 


Remark 2.3. We note that the “security under circularity” feature is inherently 
provided by the strong obfuscation guarantees, was already considered in pre- 
vious work for constructing strong encryption schemes which withstand key de- 


pendent messages and related keys attacks (CK VW10) |BC10}. 
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While AIPOs are sufficient for our WH protocol, our WZK protocol requires DLs 
and CDLs. In the full version of this paper [BP11|, we describe how DLs and 
CDLs can be constructed based on a worst-case AIPOs that satisfy an additional 
property of composability. 


3 3-Round WH 


Overview of the protocol. As a warmup, consider first the following unsound 
protocol: to prove an NP statement x € L, the prover P and verifier V first engage 
in a 2-message delegation protocol where P’s (secret) input is the witness w and 
y’s input function is the NP verification circuit Verg s. P obtains the result 
Verc x(w) and sends it to V. This is unsound since a cheating prover can always 
send “1” as its last message. 

To make the protocol sound, we augment it as follows. Let Ver% s be a circuit 
that outputs y on valid witnesses and L otherwise. Now, V will choose a secret 
string y Er {0,1}” and use the circuit Ver% „ as its secret input in the delegation 
protocol. In order to convince V of the statement, P should send back y. Indeed, 
in case x ¢ L we have Ver’. Cc = +, and hence, the “function hiding” property of 
the delegation protocol assures that P does not learn the random y. 

However, this protocol is not witness hiding. Indeed, a cheating verifier can 
try to obtain w by maliciously choosing its input function. For instance, choosing 
the function to be the identity results in the prover sending back w. 

A natural approach towards fixing the latter problem would be to have the 
verifier “prove” it behaved honestly, without revealing its secret. In other words, 
it should give a round-efficient witness-hiding proof, which is what we set out to 
do to begin with. Thus, we take a different approach. We note that an honest 
verifier that “knows” y should only be able to verify that the prover “knows” it 
as well; hence, it suffices to have the prover send a point obfuscation of y, instead 
of sending y in the clear. The security of the obfuscation would then guarantee 
that any information that the verifier learns on w could also be learned (with 
noticeable probability) without the obfuscation. 


The protocol. Let DEL = (Gen, Enc, Eval, Dec, Open) be a secure 2-message dele- 
gation protocol and let O be a point obfuscator for unpredictable distributions 
(AIPO) with recognition algorithm VY. The protocol is given by Figure [I] 


Theorem 3.1. Let DEL be a secure 2-message delegation protocol, and let O be 
an AIPO. Protocol]is a WH interactive argument. 


We briefly overview the proof of Theorem The full proof as well as an 
extension from an argument to a proof can be found in the full version of this 


paper |BP11). 


Soundness. The soundness of Protocol [I] follows from the function hiding of 
the underlying delegation scheme DEL and the recognizability of the point ob- 
fuscator. Indeed, in case there is no valid witness the verifier’s message reveals 
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Common Input: x € L. Auxiliary Input to P: w € Re(z). 


. P: Obtains sk + Gen(1”) and sends c = Enc(sk, w). 


. V: Samples y & {0,1}", obtains ê+ Eval(c, Ver% „) and sends ĉ. 
. P: Decrypts y = Dec(sk, ĉ), computes a point obfuscation O(g) and sends it. 
. V: Accepts iff V(I}, O(g)) = 1, i.e., O(y) computes the same function as Ty. 


Fig. 1. Protocol [I] 3-round Witness Hiding 


no information regarding the verifier’s secret random point y. Specifically, the 
prover’s view can be simulated independently of y. Since the obfuscation is rec- 
ognizable, in order to fool the verifier, the prover must send a point circuit 
computing J, and can only succeed with negligible probability. 


Proof of Knowledge. In fact, we can show that our WH protocol satisfies a 
stronger soundness property, namely it is a proof of knowledge. For this purpose, 
we use a similar idea to the one in the “knowledge attack” that shows why the 
protocol is not ZK (described in the introduction). In order to extract a witness, 
we essentially apply this attack repeatedly “against” the prover, revealing the 
witness’ bits one by one. Our extractor only makes black-box use of the prover 
and extracts the witness using rewinding. 


Witness hiding. The WH property is based on the input hiding of the delegation 
scheme DEL and the indistinguishability with respect to unpredictable distribu- 
tions guarantee of the AIPO O. Concretely, we show how any V* that manages 
to extract a witness w from its interaction with P can be used to break the input 
hiding property of DEL. The reduction samples (x,w) from the hard distribu- 
tion and submits cp = w, cı = 1!”! to the challenger. Upon receiving a challenge 
c = Enc(sk, cy) it simulates V*(x) with c as the first message. V* then generates 
its own message ¢, and it is left to simulate the last obfuscation message. To 
do so, we treat two cases, corresponding to whether the secret point y (induced 
by V*’s choice of input circuit to DEL) is (a) unpredictable from (a,c) or (b) is 
predictable by some poly size predictor JT. Intuitively, the first corresponds to a 
verifier that chooses its input circuit maliciously to gain information on w. The 
second, corresponds to a verifier that chooses its circuit honestly. To simulate 
the obfuscation in the second case, we apply the prediction circuit to compute 
y + II (a,c) and feed V* with O(y). In the case that y is unpredictable, we feed 
y* an obfuscation O(u) of a random point u. Finally, when V* outputs w, we 
check whether it is a valid witness, and if so answer the challenger with b = 0. 
Otherwise, we guess b at random. Indeed, by the indistinguishability guarantee 
of the AIPO, in case b = 0 (ie., the simulation is done with an encryption of 
w) the simulated V* will manage to extract a witness with noticeable probability 
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(related to the the prediction probability of IT and the success probability of V* 
in a true interaction). In case that b = 1, the reduction is unlikely to produce a 
valid witness since its view is completely independent of w and the underlying 
distribution is hard. We stress that the reduction is, indeed, not black box in V*; 
in particular, it applies the predictor IJ implied by the AIPO guarantee, which 
is not black-box in V*. 


On restricted auxiliary input. In our WH protocol we require the AIPO dis- 
tributional guarantee to hold with respect to any unpredictable distribution. 
However, we can in fact settle for less. Specifically, the auxiliary input distribu- 
tion in Protocol [lis essentially restricted to a very “benign” form; namely, the 
first delegation message (ciphertext) and the hard instance x; in particular, the 
auxiliary input is of fixed polynomial size and can be made much shorter than 
the obfuscated random point. 


Why isn’t Protocol] ZK? Protocol J is not ZK and in fact enables a cheat- 
ing verifier V* to learn arbitrary predicates on the witness. Specifically, V* can 
deviate from the protocol by maliciously selecting its input circuit C for the 
delegation protocol as follows. Let B : {0,1}* —> {0,1}* be a polynomial time 
computable function with t = O(log(n)) output bits. To learn B(w), V* fixes 
an arbitrary set of strings Y = {i} je¢0,1}¢ and sets its input circuit C = Cg 
to map the witness w to Yg(w). Indeed, given an obfuscation of Cg(w), V* can 
simply run the obfuscation on all points in {y;} and learn B(w). In the following 
section we explain how to transform Protocol []to a WZK protocol. 


4 3-Round WZK 


Overview of the protocol. To make ProtocolfI]WZK, we try to cope with verifiers 
executing the “malicious circuit choice attack” described in the previous section. 
As explained in the introduction, this involves two main modifications: 


1. We require that the verifier’s message also includes a digital locker DL(Iyr, ), 
which on the secret input y “unlocks” the secret coins ry used by the verifier 
in the delegation protocol. Upon receiving this message, the honest prover 
P applies Dec as in the previous protocol, obtains y, and then retrieves the 
coins ry. Now P can apply the Open algorithm of the delegation to verify 
that the input circuit of V* was honestly chosen (to be Ver% „). In case it was 
not, P returns a circular digital locker (CDL) (Definition 2.9) of a randomly 
selected circular point circuit. 

2. The prover is required to send back an obfuscation of y (as in the previous 
protocol). However, to maintain soundness we should prevent a malicious 
prover from using (or mauling) the verifier’s message DL(I,-,,,,) to get the 
required obfuscation. For this purpose, we apply a “non-malleable obfus- 
cation scheme”, implemented as follows/| In its first message, the prover 


? We only consider a very restricted form of non-malleability where the adversary tries 
to copy an obfuscation of the same point. A more general notion of non-mailable 
obfuscation can be found in |CV08}. 
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commits to a random r € {0,1}” (by sending the image of r under some 
injective OWF f). Then in the last message, it sends a circular digital locker 
CDL(Iys,) that “binds” r and the secret point y. The honest verifier then 
runs the CDL on y, retrieves r and uses the CDL recognition algorithm to 
validate the CDL. 


We now fully describe the protocol and then turn to analyze it. 


The protocol. Let DEL = (Gen, Enc, Eval, Dec, Open) be a secure 2-message del- 
egation scheme. Let DL, CDL be a digital locker and a circular digital locker. 
Let V be the recognition algorithm for the CDL. Let f be an injective one way 
function. The protocol is presented in Figure [2] 


Common Input: x € L. Auxiliary Input to P: w € Re(z). 


1. P: Obtains sk + Gen(1") and c + Enc(sk, w), samples r € {0,1}”, sends c 
and f(r). 


. V: Samples = 0,1}”, obtains é + Eval(c, Ver% „) using random coins ry, 
y La 8 


sends é and DLy = DL(Iy>ry). 

. P: Decrypts ğ = Dec(sk,é), obtains fy = DLy(ğ), verifies that 
V([g4ry,DLy) = 1 and Open(é, fv) = Ver% ,- 
If so, sends back CDLp = CDL(Igs,). Otherwise, samples u € {0,1}” and 


sends back CDLp = CDL(Iusu). 
4. V: Obtains f = CDLp (y), accepts iff f(F) = f(r) and VUysr, CDLp) = 1. 


Fig. 2. Protocol 2] 3-round WZK 


Theorem 4.1. Let DEL be a 2-message delegation protocol, let DL be a digi- 
tal locker and CDL a circular digital locker, and let f be an injective one way 
function, then Protocol] is a WZK argument. 


We briefly overview the proof of Theorem [B.I] The full proof can be found in 
the full version of this paper [BP11]. 


Soundness. Soundness is shown in two stages. First, we argue that given V’s 
message (ĉ, DLy), it is hard to recover the underlying secret point y. I.e, no 
poly-size circuit family can recover y, except with negligible chance. Indeed, the 
auxiliary input obfuscation guarantee implies that if y can be recovered from 
DLy and the related auxiliary information ¢, it can also be recovered solely 
from ĉ. However, since x ¢ £ and DEL is function hiding, y cannot be recovered 
from ĉ (similarly to the WH protocol). 

Second, we show that any cheating prover P* can be used to recover y from 
V’s message. Assume WLOG that P* is deterministic, and note that, in its 
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first message, P* sends some (fixed) f(r). Since f is injective, P* is in fact 
“committed” to the corresponding fixed r. We can then feed P* with V’s message 
and get back CDLp. Noting that whenever P* convinces V, CDLp(r) = y, we 
can run CDLp on r (given as non-uniform advice) and obtain y with noticeable 
probability. 


Weak zero-knowledge. We present a WZK simulator that, given an adversary V* 
and a distinguisher D, simulates the view of V* with respect to D. Let VZ be the 
composition of D with V*. V5 outputs a bit after receiving CDLp = CDL(I,<,) 
as the last message. In particular, there exist a PPT Scpų that simulates V%’s 
output given oracle access to Iys, and auxiliary input ai = (z, x, c, f(r)), repre- 
senting the rest of V’s view. 

The WZK simulator S will simulate ai on its own, and utilize Scp to simulate 
CDLp as the last message. To simulate ai, S samples r and computes f(r). c 
is simulated by generation a random key sk + Gen(1”) and computing c = 
Enc(sk,0!”!) (instead of w as in a true interaction). The input hiding of DEL 
implies that the simulated ai is indistinguishable from the true ai. We explain 
how Scop is used to simulate the last obfuscation message. S first obtains the 
verifier’s message (DLy»,é). It then runs Scp with the simulated ai, monitoring 
all its oracle queries. We treat two separate cases: (a) Sco makes a query y which 
unlocks DLy»; (b) Scp_ never makes such a query, in which case we always answer 
its queries with L. 

The first case corresponds to a verifier that “knows” the secret point y. In 
this case, our simulator can perfectly simulate the behavior of P. That is, it can 
“open” ¢ to check its validity and consistency with DLy«, and send back the 
corresponding CDL. 

The second case corresponds to a cheating V* that either produces an invalid 
message or somehow produces a valid message but without actually “knowing” 
the secret y. In this case, the simulator will always return a “dummy obfus- 
cation”. This simulates the behavior of the honest prover P. Indeed, if V*’s 
message is invalid, the prover also produces a “dummy obfuscation”. If V* does 
not “know” y, it can not distinguish P’s message from a “dummy obfuscation”. 


The simulator. Let V* be any verifier, and let D be the distinguisher circuit. De- 
note by Vy (z,z,c, f(r)) the algorithm that runs V*(z,«), feeds it with (c, f(r)) 
as the first message, and outputs V*’s message. Denote by V3 (æ, z,c, f(r), CDLp) 
the algorithm that runs V*(x,z), feeds it with (c, f(r)) as a first message, 
with CDLp as a second message, and returns V*’s output. Also, denote by 
Vi (a, z,c, f(r), CDLp) the algorithm that runs Vš (x, z,c, f(r), CDLp), applies 
the circuit D on the output of V3 and returns the output bit of D. 

Let Sy- p(x, z,¢, f(r)) be the PPT obfuscation simulator of V as specified 
by Definition 2.4] Also let (n) be the length of a witness for instances of length 
n. The description of the simulator is given by Algorithm 1. 
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Algorithm 1. Simulator S 
Input: x € L,z € {0,1}* 


Set y= L. 
Sample r, u £ {0,1}”. 
Obtain sk + Gen(1”). 
Compute c + Enc(sk, 1“!?)) 
Compute (é, DLy) = Vi (a, z,c, f(r)). 
Emulate Sy« p(x, z,c, f(r)). 
for each oracle query Q made by Sy«,p do 
if DLy(Q) = L then 
9: Answer S’s query with L and continue the emulation. 
10: else 
11: Set fy = DLy (Q) 
12: if V(Iq=>ry, DLy) = 1 then 
13: Set J= Q 
14: end if 
15: End the emulation of Sy p. 
16: end if 
17: end for 
18: if 7 = L or Open(é, fv) # Ver? , then 
19: return Vž(z, z,c, f(r), CDLUusu))- 
20: else 
21: return Vž(x,z,c, f(r), CDL(Ugs,)). 
22: end if 
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Abstract. Traditional security definitions in the context of secure com- 
munication specify properties of cryptographic schemes. For symmetric 
encryption schemes, these properties are intended to capture the protec- 
tion of the confidentiality or the integrity of the encrypted messages. A 
vast variety of such definitions has emerged in the literature and, despite 
the efforts of previous work, the relations and interplay of many of these 
notions (which are a priori not composable) are unexplored. Also, the 
exact guarantees implied by the properties are hard to understand. 

In constructive cryptography, notions such as confidentiality and in- 
tegrity appear as attributes of channels, i.e., the communication itself. 
This makes the guarantees achieved by cryptographic schemes explicit, 
and leads to security definitions that are composable. 

In this work, we follow the approach of constructive cryptography, 
questioning the justification for the existing (game-based) security def- 
initions. In particular, we compare these definitions with related con- 
structive notions and find that some are too weak, such as INT-PTXT, 
or artificially strong, such as INT-CTXT. Others appear unsuitable for 
symmetric encryption, such as IND-CCA. 


Keywords: confidentiality, integrity, constructive cryptography. 


1 Introduction 


Symmetric encryption protects the confidentiality of messages transmitted be- 
tween two parties that share a secret key. Intuitively, this means that the en- 
crypted message (the ciphertext) transmitted from the sender A to the receiver 
B does not leak information about the contents of the message (other than, 
for example, its length). In contrast, encryption generally does not protect in- 
tegrity: If the ciphertext is modified during transmission, the message obtained 
by decrypting might differ from the original message. 

For some applications of encryption schemes, bare confidentiality is not suf- 
ficient. In his analysis of the Authenticate-then-Encrypt (AtE) transformation, 
Krawczyk [18] constructs an encryption scheme that guarantees confidentiality, 
but if one uses it to encrypt authenticated plaintexts, the combined scheme 
does not guarantee both confidentiality and integrity. The vulnerabilities can 
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either be seen as a breach of confidentiality [I8] or as a breach of integrity, see 
Sect. [4.4] Natural candidates, such as the cipher block chaining mode (CBC) 
or stream ciphers, are not vulnerable; they provide weak but sufficient integrity 
guarantees [25]. 

In this paper, we use the approach of constructive cryptography [21122] for a 
systematic treatment of security notions for symmetric encryption schemes. This 
approach leads to security definitions that capture the exact conditions that the 
schemes have to satisfy to achieve certain guarantees for the message transmis- 
sion. In particular, these definitions are composable, which is instrumental for 
the soundness of a modular protocol design. We then show how different types of 
confidentiality and integrity are captured and compare these notions with sev- 
eral security definitions from the literature. This shows that some of the previous 
definitions are either too weak or artificially strict (which is in general undesired 
as it may lead to disregarding efficient schemes that are indeed sufficient). 


1.1 Game-Based Security Definitions 


Most widely-used security definitions for cryptographic schemes in the context 
of secure communication are game-based. The main concept of these definitions 
is an interaction of two (hypothetical) entities: The challenger and the attacker. 
During this interaction, the attacker issues certain “oracle queries” to the chal- 
lenger; these queries model the use of the scheme in applications. The game also 
specifies a goal for the attacker, which often corresponds to forging a message 
or distinguishing encryptions of different messages. The infeasibility of achieving 
this goal is supposed to capture the guarantees required from the scheme. 

Unfortunately, the oracle queries and winning conditions of games encode the 
use and guarantees only implicitly, and the exact guarantees are often hard to 
understand. In particular, such security definitions are generally not composable, 
and subtle details often have a significant impact on the resulting guarantees: 
Examples where slight slackness in the oracle queries rendered the guarantees of 
games too weak are discussed in Sect. 


1.2 Constructive Cryptography 


The foundational idea of constructive cryptography [21]22] is to specify both the 
setup assumptions and the guarantees of protocols explicitly as resources, and 
to consider a protocol as a transformation of such resources. Here, a resource is 
a shared functionality accessed by several parties (similar to the ideal function- 
alities in frameworks such as [2J8]). Real resources are assumed functionalities 
needed for executing protocols (such as a network) and ideal resources describe 
the guaranteed functionalities the parties want to achieve. The way a party ac- 
cesses a resource is described by the interface provided by the resource to this 
party; the resource provides one interface per party. 

A converter systems formalizes the actions that a party performs locally, for 
example when it uses a cryptographic scheme. A converter has two interfaces: 
The inner interface is attached to an interface of the resource, and the outer 
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interface is used by the party instead of the original interface of the resource. In 
particular, the composition of the resource and the converter is again a resource 
with one interface for each party, which is depicted in Fig [1] for the case of 
symmetric encryption. 

A protocol is a tuple (in our context just a pair) of converters, there is one such 
system for each (honest) party. The goal of a protocol is to construct a specified 
ideal resource from available real resources, where the meaning of “construct” 
is made precise in Sect. The constructed ideal resources can again serve as 
real resources for other protocols. 


1.3 Secure Communication 


The resources considered in this work are communication functionalities with 
different types of security guarantees, and the goal of a cryptographic protocol 
is to construct a functionality with stronger guarantees from one (or more) with 
weaker guarantees. As the setting for communication security is described by 
two (honest) entities that communicate in a potentially hostile environment, we 
consider resources with three interfaces: One interface labeled A for the sender [] 
one labeled B for the receiver, and a third one that is labeled Æ and captures 
potential adversarial access. A resource of this type is called a channel (from A 
to B), and its security properties are described by the capabilities provided at 
the E-interface. The basic types of channels are (informally) described in the 
following table, using the notation of [24]. 


—> An insecure channel leaks the complete messages at the E-interface, and 
allows at the E-interface to delete, change, or inject messages. 

e— An authenticated channel leaks the complete messages. The E-interface 
only allows to forward or to delete messages. 

—» A confidential channel only leaks the length of the messages, but allows 
to delete, change, or inject messages. 

e—se A secure channel only leaks the length of the messages and only allows 
to forward or to delete messages. 


The intuitive interpretation of the symbol “e” is that the capabilities at the 
marked (sender’s or receiver’s) side of the channel are provided exclusively to 
that party. Consequently, if one side is not marked, the adversary might also 
be able to send or receive messages. A shared secret key is a system e—=e that 
outputs the same random value at the A- and B-interfaces, and does not interact 
at the adversarial interface. This system models the key that is required by 
(symmetric) schemes; it could be generated in a key agreement protocol. 
Security mechanisms such as encryption or MAC schemes are protocols that 
transform one type of channel (and possibly a shared secret key) into a “more 
secure” type of channel. In Fig. [i] the protocol (enc, dec) uses as resources a 
channel —> and a key e=s. The converter enc is attached with its inner interface 
to the A-interfaces of —> and ee (dec is attached to the B-interfaces), and 
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Fig. 1. Encryption protocol (enc, dec) applied to the channel —> and the key e—=e 


the outer interfaces of enc and dec are the interfaces of the constructed (dashed) 
system, which is again a channel. For more examples, we refer to , 


1.4 Related Work 


The major part of research on (symmetric) encryption schemes has been pursued 
in game-based security models. The nowadays “standard” confidentiality notions 
IND-CPA and IND-CCA are derived from [14] and have been translated to the 
setting of symmetric encryption schemes by [3]. Further variants of these notions 
are introduced and compared in [I6]. Several types of integrity guarantees have 
been considered: Notions of non-malleability have been translated in [5] from 
the respective public-key notions [12]. Further standard notions are INT-CTXT 
and INT-PTXT (integrity of ciphertext and integrity of plaintext, respectively) 
introduced and analyzed in [5], their relation is further examined in [28]. Also, 
various types of unforgeability notions appear in the literature . 

The security requirements for schemes used to protect communication over 
insecure networks is often specified as a combination of properties for confiden- 
tiality and integrity, where the standard combination is IND-CPA and INT- 
CTXT [B7]; combinations with weaker types of integrity properties appear 
in [5)9|13/7 7/27}. A single game-based notion for authenticated encryption ap- 
peared in [31/34]. A different approach is taken in the definition of [9]: While 
confidentiality is similar to IND-CPA, authenticity is simulation-based; equiva- 
lent fully game-based notions appear in [27]. Fully simulation-based definitions 
of secure communication have been provided in [29] for Reactive Simulatability 
and in [10] in the UC framework. 


1.5 Outline 


We analyze confidentiality and integrity notions for (symmetric) encryption 
schemes using the paradigm of constructive cryptography. Sect. 2] introduces 
the notation and the general model, and Sect. [3] shows how different types of 
confidentiality and integrity guarantees are captured. In Sect. Æ| we compare 
various existing game-based security definitions to the notions in our model. 


2 Preliminaries 


We use the concept of abstract systems [22]23] to formulate our results. At the 
highest level of abstraction, a system is an object with interfaces via which it 
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interacts with its environment and with other systems. Every two systems can be 
composed by connecting one interface of each system, and the composed object 
is again a system. Also, every two different systems are mutually independent. 


2.1 Notation 


We consider two distinct types of systems, resources and converters, and we 
describe topologies of these systems using the notation from [23]. Resources, 
with three interfaces labeled by A, B, and E, are denoted either by special 
symbols or by upper case boldface letters. Converters, with one inner and one 
outer interface, are denoted either by small Greek letters or by special identifiers 
such as enc or dec; the set of all converters is denoted as X. 

The composition of a resource R and a converter ¢ is written as œR, where 
the label J € {A, B, E} means that the inner interface of ¢ is attached to the 
I-interface of the resource R. Note that the composed system is again a resource 
that exposes the outer interface of ¢ as the I-interface together with the other 
interfaces of R. A protocol is a pair of converters, one for each honest party, and 
applying the protocol (¢1, ¢2) to the resource R. is defined as 64467 R—attaching 
the converters to the A- and B-interfaces of the resource. 

If two resources R and S are used in parallel, this is denoted as R||S and is 
again a resource with the same set of interfaces; each of these interfaces A, B, or 
E of R||S allows to access the corresponding interfaces of both sub-systems R 
and S. The sequential composition of converters is denoted by Yog, and is defined 
by (y o ¢)/R = y1 (¢'R) for all resources R. The parallel composition 7||¢ of 
converters is defined by (7)||¢)/(R||S) = (v4 R)||(¢7S) for all R and S. The term 
id refers to the “identity converter” that forwards all inputs and outputs. 

In general, for bit-strings x = x,---x, E {0,1}”" and l < n, we denote by 
x|, the sub-string x|; = xı --- xı. We extend the operation “®” to bit-strings by 
defining, for £ = £1 -- -£n and a’ = x): xl, the ith bit of  @ a’ to be x; © z4. 


2.2 Discrete Systems 


In the analysis of protocols, we model all systems as (probabilistic) discrete 
systems that communicate by passing messages, where the term “discrete” refers 
to the value spaces of the messages as well as the time. The behavior of discrete 
systems is formalized by random systems [20], i.e., conditional distributions of 
the outputs of the system (as random variables) given all previous inputs and 
outputs. Each input or output is associated to a specific interface. 

Discrete systems are an instance of the abstract systems concept described 
above. The composition of two discrete systems (such as connecting a resource 
and a converter via interfaces) is a discrete system whose behavior is defined via 
an interaction of the two sub-systems: A message that is input to the system 
is processed by the sub-system corresponding to the (external) interface where 
the message was input, and, if the sub-system provides output at the (internal) 
connected interface, this value is processed by the other sub-system. Once one 
of the two sub-systems outputs a message at an external interface, this becomes 
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the output of the composed system. The parallel composition of two resources is 
defined asynchronously: Each input at an interface A, B, or E explicitly specifies 
one of the sub-systems, and this sub-system is invoked with the input. 

A distinguisher D is a system that connects to all interfaces A, B, and FE 
of a resource U and outputs (at a separate interface) a single bit, here called 
W. The complete interaction of D and U defines a random experiment, and the 
probability that the bit W is 1 is written as PPU(W = 1). The distinguishing 
advantage of D for U and V measures how much the output of D differs when 
it is connected to either U or V. Intuitively, if no (efficient) distinguisher dif- 
ferentiates between the two systems, they can be used interchangeably in any 
environment (as otherwise the environment serves as a distinguisher). 


Definition 1 (Distinguishing advantage). The distinguishing advantage of 
a distinguisher D for the systems U and V is defined as 


AP (U,V) := |PPU(W = 1) — PPY(W =1), 


where W is the special output of D. The advantage for a set D of distinguishers 
is defined as AP (U, V) := suppep AP(U, V). 


2.3 The Simulation-Based Security Definition 


The paradigm of constructive cryptography is derived from [23] and follows the 
ideal world/real world approach similar to [829]: The “real world” describes 
the protocol execution with two honest parties and an adversary, and is defined 
by the composition of the two converters of the protocol (7,72) with the real 
resource R. In the “ideal world”, the ideal resource S specifying the security 
goals is composed with a simulator o connected to the E-interface. The purpose 
of ø is to adapt the E-interface of S such that it resembles the corresponding 
interface of t/a? R. (As the adversary can emulate the behavior of ø, using o? S 
ieee of S can only restrict the adversary’s power, so using o’S and hence 
TÄTËR instead of S is safe.) 

To exclude trivial protocols, we require that if no adversary is present, the 
protocol must implement the specified functionality. In the definition, we use 
the special converter “L” that, when attached to a certain interface of a system, 
blocks this interface for the distinguisher. 


Definition 2 (Secure construction). The protocol m constructs S from the 
resource R within ¢ and with respect to the set D of distinguishers if 


oE: AP (nfnËR,o”S) <e and A? (tfn? LER, LFS) < 


An important property of Definition [2] is its composability. Intuitively, if a re- 
source § is used in the construction of a larger ystems then the composability 
implies that S can be replaced by a construction 747?R ote affecting the 
security of the composed system. Theorem[]] taken for [25], shows that security 
and availability are preserved under sequential and parallel composition. 
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Theorem 1 (Composition for the 3-party setting). Let R, S, T, and U 
be resources, and let tn = (T1, T2) and w = (Yı, Y2) be protocols such that m 
constructs S from the resource R within €1 and w constructs T from S within 
€2. If the considered class of distinguishers is closed under composition with 
converters, that is Do) C D, then (1071, W20 72) constructs T from R within 
E1 + €, (millid, rallid) constructs S||U from R||U within cı and (id||m, id||72) 
constructs U||S from U||R within £1. 


In asymptotic statements, a system S implicitly refers to a family of systems 
{Sk}ken, and the distinguishing advantage is a real-valued function in the pa- 
rameter k: For each k, one considers the distinguishing advantage where, for all 
involved systems, one takes the element described by this k. Efficiency notions 
for sets of systems and a negligibility notion for the distinguishing advantage 
can be chosen such that they are closed under composition. Examples are the 
sets of systems with a polynomial bound on the number of queries and/or the 
run-time, together with the standard notion of negligibility. 


2.4 Resources and Protocols as Discrete Systems 


This section details the resources and protocols considered in the setting of secure 
communication. 


Channels. Let M be a discrete set, we usually consider M := {0,1}*. A channel 
with message space M is a resource that takes at the A-interface inputs from 
the set M and provides at the B-interface outputs from M := M U {Ø}, where 
the element Z is interpreted as indicating a transmission error. A single-use 
channel allows for exactly one input at the A-interface and one output at the B- 
interface, a multiple-use channel allows for several (arbitrarily interleaved) such 
interactions. The possible interactions at the E-interface describe the security 
properties of the channel. For the insecure channel —>, every input m € M 
at the A-interface provokes the output m at the E-interface, and every input 
m’ € M at the E-interface leads to the output m’ at the B-interface. The 
E-interfaces of the “more secure” types of channels are detailed in Sect. [B] 


Keys. Let K be a discrete set, usually K := {0,1}* for some k € IN. A key with 
key space Ķ is a resource that draws a key k € K uniformly at random and 
outputs it to both A and B. The E-interface does not provide any output. 


Encryption Protocols. An encryption protocol with key space K, message space 
M, and ciphertext space C is a pair (enc, dec) of converters. These converters 
connect with their inner interfaces to a shared secret key with key space K and 
to a channel with message space M’ D C. The resulting resource is a channel 
with message space M. 

As an example, we describe the one-time pad encryption for bit-strings with 
length at most n. The key space in this setting is K = {0,1}", and the message 
space of the assumed channel is (in general at least) the set of strings of length 
at most n bits, M’ = C =U,<,,{0, 1}!. 
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Example 1 (The one-time pad). The encryption converter otp-enc (generically 
called enc in Fig. [I) obtains as input the n-bit key « at the inner interface and 
a message m with |m| < n at the outer interface. The message transmitted via 
the channel is c = m © k||m|. The decryption converter otp-dec obtains the key 
k and the ciphertext c’ at its inner interface. It computes m = d @ &|).-; and 
outputs the message m’ at the outer interface. 

Fig. [I shows the setting in which the encryption and decryption converters 
are attached to the resources, the channel —> and the key ese, with their inner 
interfaces. Both the A-interface and the B-interface of the combined (dashed) 
system are of the same type as for the original channel: The A-interface allows 
to input messages from M = C, and the B-interface outputs messages from the 
same set. Hence, the complete system is again a channel with message space M 
(but differs at the E-interface). 

The scheme extends to multiple, say t, messages as follows. Consider a key 
with key space {0,1}*", and encrypt/decrypt the ith message with the bits (i — 
1)n + 1 through (i — 1)n + |m;l. + 


2.5 Formalizing Games 


In game-based definitions, we formalize both the adversary and the game (or 
challenger) as systems, which are connected via their interfaces as described in 
Sect. 2.2} The game allows the adversary to make certain “oracle queries” via this 
interface. Whether or not the game is won is signaled by a special (monotone) 
output bit of G (this can be considered as an additional interface) that is initially 
0 but switches to 1 as soon as the winning condition is fulfilled. For a game G 
and an adversary A, we define the game-winning probability after q steps as 


L(G := PAS (W, = 1). 


For an adversary that halts after (at most) q steps, we write T'A (G) := T(G). 
As winning the game with a certain probability might be trivial (such as when 
the goal is to guess a secret bit), one usually considers the advantage of A, 
that is, the (absolute) difference between A’s probability of winning G and the 
probability for “trivial” strategies. 

If a security property of a scheme is defined by the adversary’s inability to 
win a game G, then we say that the scheme is e-secure with respect to that 
property and a clas} D of adversaries if the advantage for D in winning G is 
bounded by e. 


3 Notions of Confidentiality and Integrity 


The security of communication channels corresponds to restrictions on the ca- 
pabilities provided at the E-interface, which can be characterized according to 
two aspects: the amount of information leaked about transmitted messages, and 


2 We will often use the same class D for both adversaries and distinguishers. 
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the potential influence on messages delivered to the receiver. Consequently, a 
confidentiality guarantee bounds the amount of information that is leaked, and 
an integrity guarantee restricts the adversarial influence on delivered messages. 


3.1 Confidentiality 


A channel is perfectly confidential if no information about the transmitted plain- 
text message is leaked at the E-interface. We also consider weaker types of 
confidentiality where the “amount of leakage” is non-trivial but bounded; the 
(remaining) guarantee is described by a function on the transmitted messages. 


Definition 3 (Leakage specification). For some (discrete) set S, a leakage 
specification is a family of functions L = {4; : Mİ > S}is1. 


Functions 4; on vectors of messages allow to capture, for example, channels that 
leak whether the same message is sent twice (as in deterministic encryption). 


Definition 4 (Confidential channels). For £ = {4; : M' > S}ix1, let on 
be the channel that, given inputs m,,...,m; at the A-interface, outputs the value 
Li(mı,... mi) at the E-interface (and only allows forwarding or deleting mes- 
sages). A channel C is £-confidential if there exists a simulator o such that 


A? (18C,18c8(eHe))=0, and A?(L2C,18(e)) =0, 


where D is the set of all distinguishers. If M C {0,1}* and the leakage is re- 
stricted to li : (mı, ..., Mi) + |mj| for alli, the channel is simply called confi- 
dential. 


The condition of being £-confidential is merely a restriction on the information 
leaked at the E-interface; there is no guarantee on the potential influence of the 
adversary on the delivered messages. In the security condition, this absence of 
guarantees is expressed by attaching the converter L to the B-interface, which 
hides all messages from the distinguisher. 

The goal of an encryption protocol is to construct a confidential channel from 
one that is not confidential. In particular, the one-time pad encryption achieves 
confidentiality in this sense. 


Example 2 (Confidentiality achieved by the one-time pad). The ciphertext gen- 
erated by the one-time pad encryption for the message m € M = U)<,,{0, 1}! 
is an |m|-bit string of independent and uniformly distributed random bits. The 
information leaked to the adversary is exactly the length |m| of the message: 
There is a simulator that, given the length |m|, generates a ciphertext that has 
exactly the same distribution as the “real” ciphertext for the message m. 

This means that the leakage is described by |-| : M — {1,...,n} (for multiple 
messages, l; maps (m1,...,™m;) to |m,|). The channel that is constructed by the 
one-time pad from the insecure channel is described in Examples Blana ¢ 
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3.2 Integrity 


Encryption schemes in general do not protect the integrity of messages: If the 
adversary replaces the transmitted ciphertext c for a message m € M by a 
ciphertext c’ Æ c, the receiver will potentially obtain a different message m’ € M 
during the decryption. For the adversary (oblivious of m), replacing c by œ 
corresponds to selecting a transformation F : M — M that describes, for every 
potentially transmitted message m, which message m’ = F (Mm) the receiver would 
obtain, given that the original message was m. 


Example 3 (XOR-Malleability of the one-time pad). For the one-time pad en- 
cryption, the adversary can replace the transmitted ciphertext c by an arbitrary 
ciphertext c’. Assume that c = m 9 « and |c| = |c’|, then this means that the re- 
ceiver will compute m’ = d k = d Pca m. Hence, replacing c by c’ corresponds 
to selecting the function m > m@ (c@¢). + 


In general, the distribution of each output at the B-interface depends on the pre- 
vious inputs and outputs at all interfaces of the channel. But then, conditioned 
on the complete interaction at the E-interface—the adversary’s knowledge— 
the channel “transforms” all inputs at the A- and all previous outputs at the 
B-interface into the next output at the B-interface; the interaction at the E- 
interface can be seen as a choice of a particular such plaintext transformation. 


Definition 5 (Plaintext transformation). Let M be a discrete set. A plain- 
text transformation F on M is a (probabilistic) transformation M* x M* => M. 


The arguments of the plaintext transformation are the sequence of messages 
transmitted by the sender, and the sequence of messages previously delivered to 
the receiver; the result is the next message delivered to the receiver. The set of 
all plaintext transformations available to the adversary formalizes the potential 
adversarial influence on the delivered messages. Of course, the less such transfor- 
mations are available to the adversary, the stronger are the integrity guarantees 
of the channel. This is captured by the concept of integrity specifications. 


Definition 6 (Integrity specification). An integrity specification is a family 
F := {Fa}qew of random variables with Fy C F, where F is a set of plaintext 
transformations. 


The random variables F4 C F formalize that, depending on the state of the 
channel, only a subset of the transformations might actually be accessible: After 
the qth query to the channel, the adversary may choose a transformation from the 
set F, (note that this choice corresponds to replacing the transmitted ciphertext 
in the “real world”). The generality of this definition is indeed necessary to 
describe the malleability of certain encryption schemes, such as CBC mode [25]. 
There, the availability of certain transformations depends on the randomness 
used during the encryption, so F, 4 F. 


Example 4 (XOR-malleability). Let m, m’, c, and c’ be as in Example B] If we 
set ô := c@c’, the adversary’s choice to replace c by d = c@6 can be interpreted 
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as selecting the XOR-mask 6 for the transmitted message. More generally, the 
plaintext transformations F; j, after 7 inputs at the A-interface and j inputs at 
the E interfaces, with 6; € U,<,,{0, 1}!, are described as follows: 


— i<j: the output is a uniformly random |6,|-bit string, 

— i > j and |d;| < |m;|: the output is mj|\5,) © 6;, 

— i > j and |d,;| > |m,|: the output is m; 64, followed by |6;|—|m,| uniformly 
random bits. 


The transformations available after i inputs at the A- and j inputs at the B- 
interface are, for each ô € U)<,,{0,1}”", the transformations F; js. + 


The set F, of transformations available after the qth query must be (implicitly 
of explicitly) known to the adversary; abstractly, a description of the set Fy is 
output to the adversary by the channel. Of course, for a confidential channel, 
this description must not leak any information beyond the information specified 
by the leakage. In the following definition, we refer to the number of queries at 
the A- and E-interfaces by i and j, respectively, and use q := i + j. 


Definition 7 (Malleable confidential channel). Let £L be a leakage specifi- 
cation and F be an integrity specification such that the distribution of each Fq 
depends (only) on the leakage €,(m*) for 1 < s <i of the messages mı, ..., Mi, 
the previous sets F1,...,Fg-1, and the selected transformations F,..., Fj. An 


F-malleable £-confidential channel = (in the following only — if L and F 
are clear) is an £L-confidential channel with malleability described by F. 

On receiving m; at the A-interface, —+ outputs ¢;(m') and a description 
of Fq at the E-interface. Upon receiving a description of F € Fq at the E- 
interface, — evaluates the transformation F on the plaintexts and outputs the 
result at the B-interface. If the L-converter is attached to the E-interface, —e 
immediately forwards each input m; from the A- to the B-interface. 


As an example, we describe the XOR-malleable confidential channel and sketch 
the proof that the one-time pad constructs this channel from an insecure channel 
and a shared secret. keyf] 


Example 5 (The XOR-malleable channel). The channel —p-e behaves as follows. 
Upon the ith input m; E€ M at the A-interface, leak the length |m;| at the E- 
interface. Upon the jth input ĝ; € {0,1}" at the H-interface (after i inputs at 
the A-interface), output m} := Fj,;,6(m) at the B-interface. 

We use the following simulator o to prove that the one-time pad indeed con- 
structs pre: 


— Upon a message l; € {1,...,n} at the inner interface (i.e., from —@-6), 
output a uniformly random l;-bit string č; as the transmitted ciphertext at 
the outer interface. 


3 For simplicity, we only consider the case i > j. For the general case, cf. Sect. 6.1]. 
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— Upon a message č} at the outer interface, 
e if j >i, input 6; = ols! at pe, 


e if j < i and |é,| > |¢,|, input 6; = č;|je] © Čj at —de, 
e else, input 6; = (č; ® &,)foles!—leal at pe. 
The simulator ø is perfect, i.e., AP (otp-enc4otp-dec? (—> || ee), o? (—@-e)) = 


0 for all distinguishers D: 


— On input the ith message m; at the A-interface, in both cases a |m,|-bit 
uniformly random string is output at the H-interface (generated either by 
otp-enc using the key or by a). 

— On input the jth message c} at the E-interface, the message output at the 
B interface also has the same distribution in both cases (by construction of 


g; this is a simple check for each of the cases). ¢ 


Consequently, the one-time pad constructs from the resources e=» and —> 
the channel —$-«. This channel is confidential according to Definition [4] the 
simulator assumed in the definition is trivial (both e—e and —@- leak exactly 
the length of the message). 


4 Relation to Game-Based Security Definitions 


In game-based security definitions for encryption schemes, the attacker has ac- 
cess to oracles for encrypting plaintext messages and decrypting or checking the 
correctness of ciphertexts, sometimes with additional constraints on the number 
or order of queries. The attacker’s goal is either to generate a ciphertext that 
satisfies a certain condition, or to distinguish two cases in which it is provided 
with different sets of oracles. For many of these notions, it is not clear which 
guarantees the proven schemes provide when the ciphertexts are transmitted 
over a certain type of network. 

In contrast, a constructive security statement makes these guarantees explicit: 
The confidentiality and integrity guarantees appear as the leakage functions 
and plaintext transformations of the constructed channel. In this section, we 
analyze the semantics of game-based notions from the literature by proving the 
(in)equivalence with corresponding constructive notions. 


4.1 Goals and Attack Models 


Security properties defined using games are often characterized by a goal and 
an attack model. The goal is essentially specified by the winning condition (the 
monotone output switches to 1), and the attack model is characterized by the 
“oracle queries” the adversary has at its disposal. 

The attack model roughly corresponds to adversarial access to the “real re- 
sources” used by the protocol in constructive security statements. The more 
capabilities the game provides, the weaker the security modeled by the real re- 
sources, and the stronger the requirements for the protocol. Roughly, the idea 
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of a chosen plaintext attack corresponds to the real resource being an authenti- 
cated channel, and a chosen ciphertext attack corresponds to the real resource 
being an insecure channel. The goal of a game corresponds to the attributes of 
the constructed resource. For instance, the IND-type of games are often con- 
nected with confidentiality, whereas NM (non-malleability) and INT (integrity) 
are integrity guarantees. 


4.2 Indistinguishability of Ciphertexts 


The standard security notions for confidentiality are IND-CPA and IND-CCA, 
i.e., indistinguishability (of ciphertexts) under chosen-plaintext and chosen-ci- 
phertext attack, respectively. Several variants appear in the literature; in all 
variants, a bit b € {0,1} is chosen uniformly at random, and, depending on the 
variant, the adversary has access to one of the following settings of oracles: 


— multiple queries at a “real-or-random” oracle where, in each query, the adver- 
sary inputs a plaintext mo, the game chooses mı with |mo| = |m1| uniformly 
at random, and returns an encryption of mp; 

— multiple queries at a “left-or-right” oracle where the adversary inputs two 
messages Mo and mı with |mo| = |m,| and obtains an encryption of mp; 

— multiple queries at an “encryption” oracle where, on input m, the adversary 
obtains an encryption of m, as well as one “real-or-random” query; 

— multiple “encryption” queries and one “left-or-right” query. 


Finally, the adversary has to guess the bit b (with probability non-negligibly 
different from 1/2). It turns out that, for any encryption scheme, the advantages 
that can be achieved in the above games are related by a factor that is either a 
constant or linear in the number of queries [3]. 


IND-CPA. The term IND-CPA usually refers to a game where the adversary 
has access to the oracles described in one of the four settings above. While these 
settings correspond to assuming that the ciphertexts are transmitted via authen- 
ticated channels (and cannot be changed during the transmission), in several 
practical protocols such as SSL/TLS, the ciphertexts can actually be changed 
during the transmission. Indeed, as confidentiality in the sense of Definition flis 
defined by restricting only the adversarial interface (the output at the receiver’s 
interface is ignored), one may hope that IND-CPA security will still imply this 
weak form of confidentiality in this setting. The following example shows that 
this is not the case. 

Consider an encryption scheme where a certain ciphertext ¢ € C is never used, 
and append in the encryption to each ciphertexts a perfectly hiding commitment 
on the plaintext. In particular, expand the secret key using a PRG, use the 
first part as key for the encryption and the remainder as randomness in the 
commitment. Also, modify the decryption to output the initial secret key if it 
receives the special ciphertext č. As the decryption algorithm does not appear 
in the IND-CPA game and the erroneous decryption does not hurt correctness 
(as € is never used), the modified scheme is IND-CPA secure. However, for any 
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confidential channel, it is easy to construct a distinguisher that differentiates 
between the real and the ideal setting (input a message m € M at A, input € at 
E, interpret the output at B as the secret key, expand by the PRG, and decrypt 
the output at E. If this decrypts to m and the decommitment was correct then 
say 0, otherwise say 1). 


IND-CCA. In the IND-CCA game, the adversary is, in addition to one type of 
oracles of the IND-CPA game, given access to a decryption oracle where it can 
query ciphertexts that are different from those he obtained from the encryption 
oracle[ While IND-CCA is considered the standard notion for confidentiality in 
settings where the adversary can modify ciphertexts, it differs considerably from 
the notion implied by Definition [4] In particular: 


1. IND-CCA is artificially strict: A scheme that allows “obvious” modifications 
of ciphertexts (e.g., appending bits that are ignored) is considered insecure. 

2. The definition of IND-CCA already implies strong integrity guarantees. 

3. These integrity guarantees seem artificial for symmetric encryption. 


These issues are explained further in the following paragraphs. 


Replayable CCA. Several authors have noticed that IND-CCA is 
artificially strict in the sense that the decryption oracle will decrypt any cipher- 
text except for the exact challenge ciphertext. Schemes that allow for “obvious” 
ciphertext modifications are not IND-CCA secure, the typical separating exam- 
ple being an (otherwise IND-CCA secure) encryption scheme where the encryp- 
tion always appends a single bit to the ciphertext, and this bit is ignored during 
decryption. While this modification does not hurt the security guarantees in any 
meaningful way, the resulting scheme is not IND-CCA secure. 

In [1], several variants of “replayable” CCA security are analyzed [| In these 
games, not only the exact challenge ciphertext is disallowed in decryption queries, 
but also “related” ciphertexts. Intuitively, this means that encryption schemes 
may allow certain modifications to ciphertexts that do not change the result of 
the decryption. In more detail, the notions considered in [II] are: 


— IND-RCCA, or “replayable CCA”: any ciphertext that decrypts to one of 
the plaintexts issued to the encryption oracle is disallowed; 

— IND-sd-RCCA, or “secretly detectable RCCA”: intuitively, the receiver can 
detect whether an adversarially generated ciphertext was generated as a 
“modification” of an honestly generated one, or whether it is “independent” 
of all honestly generated ones, these “modified” ciphertexts are disallowed; 

— IND-pd-RCCA, or “publicly detectable RCCA”: the above distinction can 
be done publicly, i.e., without knowledge of the secret key. 


4 The reason for the latter restriction is that if the adversary were allowed to decrypt 
the challenge, winning the game would become trivial. 

5 Their original notions regard public-key schemes, but the extensions to symmetric 
schemes are also described. 
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The exact formalization is technically involved; for details, we refer to [II]. 

With respect to achieving secure communication, the guarantees provided by 
IND-CCA and IND-sd-RCCA secure schemes are indeed equivalent, which can 
be formalized via bisimulation. Intuitively, the simulator for the IND-sd-RCCA 
scheme can use the assumed detectability to decide whether a given ciphertext 
should be considered a replay. 


Strong Integrity. An IND-sd-RCCA secure encryption scheme achieves a strong 
notion of integrity: The remaining malleability is described by the integrity spec- 
ification Fyy with the set {fm : M > M,m > M}mem of transformations, 
where NM refers to “non-malleable.” The proof of the following theorem is de- 
ferred to the full version of this paper. 


Theorem 2 (Informal). Let (enc, dec) be a symmetric encryption protocol. If 
the protocol is e-IND-sd-RCCA secure, then it constructs an Fyy,-malleable con- 
fidential channel from an insecure channel and a secret key within €. 
Conversely, if the protocol constructs an Fyy-malleable confidential channel 
from an insecure channel and a secret key within € (for distinguishers that issue 
at most q queries and with a special type of simulator) then it is (q? + 1)e-IND- 
sd-RCCA secure (with respect to the class of adversaries that issue at most q 
queries). For large message spaces, the special type of simulator is general] 


Unnatural Malleability. IND-CCA is not a natural security requirement for sym- 
metric encryption: The adversary may generate valid ciphertexts for arbitrary 
plaintexts (but only independently of honestly sent messages). Realistic symmet- 
ric encryption schemes are either malleable (such as the one-time pad or CBC) 
or, if they are non-malleable, they will actually already implement the fully se- 
cure channel (such as authenticated encryption). Here, it becomes apparent that 
IND-CCA has evolved as a notion for public-key schemes, where the adversary 
knows the encryption key and can encrypt arbitrary messages. 


4.3 Specific Variants of Integrity 


Games that are used to characterize integrity properties express impossibilities 
(for the adversary) to generate ciphertexts that satisfy certain conditions. In 
constructive cryptography, integrity guarantees are expressed explicitly by spec- 
ifying the set of transformations that model the capabilities of the adversary. 
The correspondence between these two paradigms is as follows: A scheme is se- 
cure according to a game if and only if it implements a channel that allows no 
transformations that contradict the game; the potential probability in winning 
the game translates into a distinguishing advantage in the constructive security 
statement. 


6 If the distinction between “modified” and “independent” ciphertexts can be per- 
formed without the key, then the condition on the size of the message space is not 
needed. If we assume that the distinction is perfect, the factor q? + 1 reduces to 1. 
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NM-CCA. The notion of non-malleable encryption has been introduced in [12] 
in the context on public-key schemes. Intuitively, no attacker (even given hon- 
estly generated ciphertexts) may be able to generate a ciphertext whose decryp- 
tion relates to “honestly encrypted” messages in a meaningful way. NM-CCA 
is equivalent to IND-CCA [12]; this extends to the RCCA notions [II]. Conse- 
quently, these notions also correspond to Fyy-malleable communication. 


INT-CTXT. Integrity of ciphertexts has been introduced in and for- 
malizes that the adversary cannot produce any fresh valid ciphertext. In more 
detail, an encryption scheme is said to achieve INT-CTXT security if no adver- 
sary with access to an encryption oracle can generate a valid ciphertext that is 
different from all ciphertexts obtained from the oracle. Here, “valid” means that 
the decryption outputs a message (not an error symbol). Note that existential 
unforgeability and ciphertext unforgeability [18] are similar: The differences 
are, for example, that the definition from [5J6] allows multiple queries to the 
challenge oracle, whereas allows only one. 

A symmetric encryption protocol that achieves confidentiality and is addi- 
tionally INT-CTXT secure constructs a fully secure channel from an insecure 
channel. Yet, INT-CTXT, as IND-CCA, is artificially strict concerning modifica- 
tions of ciphertexts. We describe a relaxation of INT-CTXT which is constructed 
analogously to IND-sd-RCCA. In particular, we also require the existence of a 
secretly (i.e., given the secret key) computable relation, called =,, on C with 
the same properties as for IND-sd-RCCA; this relation formalizes the receiver’s 
ability to distinguish “modified” and “independent” ciphertexts generated by 
the adversary. 

We define INT-sd-CTXT security by changing the INT-CTXT game as fol- 
lows: The adversary wins only if dec(k,c’) # B and Vr <i: d Ær Cr for all 
honestly generated cr. Note that we also have to change the output of the oracle 
in the case that ci =,, Cr holds (for some r) to be m,. The proof of the following 
theorem is deferred to the full version of this paper. 


Theorem 3 (Informal). Let (enc, dec) be a symmetric encryption protocol that 
constructs a confidential channel from an insecure channel and a secret key 
within £1. If the protocol is €2-INT-sd-CTXT secure, then it constructs a secure 
channel from an insecure channel and a secret key within £1 +€2. Conversely, if 
the protocol constructs the secure channel within £ for distinguishers in Dg, then 
it is (q? + 2)e-INT-sd-CTXT secure with respect to D, [] 


INT-PTXT. Integrity of plaintexts has been defined in and is weaker than 
INT-sd-CTXT. The adversary is also given access to an encryption oracle, but 
to win the game, it has to fabricate a ciphertext that decrypts to a plaintext 
that has not been queried at the encryption oracle before. This notion is weaker 
than INT-sd-CTXT in the sense that the adversary may still be able to generate 
a ciphertext that decrypts to plaintext that was queried at the encryption oracle 
but cannot be detected to be a modification of one particular honestly generated 


T The factor q? + 2 appears for the same technical reasons as for IND-sd-RCCA. 
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ciphertext (even if all ciphertexts are delivered). This weakens the guarantees in 
two aspects: First, the adversary can replay messages undetectably, and second, 
the adversary may fabricate messages that decrypt to any one of the previous 
messages with some probability that may even depend on the plaintexts. Conse- 
quently, if the adversary is able to determine which of the original plaintexts has 
been received, he will potentially obtain information about some transmitted 
plaintext. 

An integrity specification is value-preserving if all transformations Fy : 
M* x M* = M have the property that the output message is either one of 
the input messages or Ø, but any one of these may appear with some proba- 
bility (which may even depend on the plaintexts). The proof of the following 
theorem is deferred to the full version of this paper. 


Theorem 4. Let (enc, dec) be a symmetric encryption protocol that constructs 
a confidential channel from an insecure channel and a secret key within £1. If the 
protocol is €2-INT-PTXT secure, then it constructs an Fyp-malleable confidential 
channel within €,+€2, with Fyp being value-preserving. Conversely, if the protocol 
constructs an Fyp-malleable confidential channel within cı such that Fyp is value- 
preserving, then it is €,-INT-PTXT secure. 


Namprempre [27] introduces a related but stricter notion called SINT-PTXT, 
which prohibits replaying messages arbitrarily. There, the adversary also wins the 
game if it generates ciphertexts such that the decryption outputs any plaintext 
more often than it was queried at the encryption before. Consequently, SINT- 
PTXT corresponds to a channel with this bounded type of replay. 


Fixing the definition from [I6]. In the original game, the output of the verifi- 
cation oracle is one bit indicating whether the decrypted plaintext is valid. This 
renders the notion too weak: If (via a higher-level protocol), the adversary learns 
which of the valid plaintexts has been obtained by decrypting (this probability 
may depend on secret values), this is not captured. Hence, this notion cannot 
guarantee composability. A slight modification to the game fixes this issue: The 
verification oracle returns the decrypted message (instead of the single bit). The 
following (artificial) encryption scheme exemplifies the weakness. 


Example 6. Consider a scheme (enc, dec) secure according to the stricter notion. 
Change the decryption such that for (n,co,c1) with dec,(cpy) # L, b € {0,1}, 
the output is dec,(c,,,) (with Ky the nth bit of x). + 


The change does not affect the security with respect to the notion of [5]6]: The 
output of the oracle on (n, co, c1) can be easily computed from the output on co 
and c1. In contrast, in the strengthened game, such queries reveal the secret key. 


Plaintext Uncertainty. This notion from [13] attempts to capture that the 
adversary cannot “control” the result of a forgery. While the description is rather 
informal, it captures that the decrypted message contains a certain amount of en- 
tropy (for each message, the probability that this message is obtained by decrypt- 
ing is small). While this is hard to achieve at least for multiple decryptions—the 
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only entropy in the (otherwise deterministic) decryption is “fresh” key material— 
the computational (pseudo-entropy) version might prove useful in applications. 

The corresponding integrity specification is the set of transformations that 
have at least a certain min-entropy, meaning that for each input m and trans- 
formation F, the min-entropy of the random variable F'(m) is larger than some 
bound. Computational indistinguishability from such a channel means that the 
output at the receiver’s interface has a certain pseudo-entropy. 


Known-Plaintext Forgery. This notion from [I3] is intended to capture that 
the adversary providing a forged ciphertext can predict the changes to the trans- 
mitted message. The (informal) description in [13] states that the adversary 
could have computed the outcome with overwhelming probability (this can be 
formalized by means of an extractor). In the language of integrity specifications, 
this means that all transformations in F are deterministic (and efficiently com- 
putable). Properties of this type can indeed be helpful, as can be seen in the 
proof of the soundness of Authenticate-then-Encrypt in [25]. 


4.4 Combining Notions of Confidentiality and Integrity 


Traditionally, security requirements for schemes for protecting communication 
are expressed as a combination of separate properties for confidentiality and 
integrity [S[7913]17/27]. Such a combination, however, does not necessarily 
achieve the expected guarantees. 

We revisit an example from (modified in [25]): The composition of a 
tailor-made encryption scheme with a strongly unforgeable MAC. Briefly, the 
encryption first encodes each bit of the plaintext as two bits, such that the prob- 
ability whether flipping one of these two bits has an effect depends on the original 
value (i.e., 0++ 00, 01, or 10; 1 + 11), and encrypts this expanded string using a 
one-time pad. Hence, if one encrypts an authenticated message, the probability 
that flipping a ciphertext bit changes the contained message—and the MAC ver- 
ification fails—with a probability that depends on the original plaintext value. 
The resulting scheme achieves both confidentiality (by the one-time pad) and 
integrity (in the sense of INT-PTXT, by the unforgeability of the MAC), but the 
different success probabilities for the MAC verification leak information about 
the message, which is often described as a breach of confidentiality [I8]. 

The described scheme implements a confidential Fyp-malleable channel, where 
Fy» is value-preserving as described in Sect. [4.3} The weakness of this scheme is 
not a deficiency of confidentiality, but it only achieves a weak notion of integrity. 
Note that, in terms of integrity, INT-PTXT is equivalent to WUF-CMAR which 
is sufficient to construct an authenticated channel (where the adversary can 
only forward or delete messages) from an insecure channel. Indeed, for channels 
that are not confidential, the integrity guarantees specified by Fyp are equivalent 


8 Weak unforgeability: Given an oracle for generating tags, it is infeasible for the 
adversary to generate a tag for a message that has not been queried at the oracle. 
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to those of an authenticated channel: A simulator that knows the plaintext 
messages can sample according to distributions that depend on these messages. 
This equivalence does not hold if the considered channels are confidential. 


4.5 A Critique of Game-Based Security Notions 


Starting from [14], the major part of research on the security of encryption 
schemes has been pursued in game-based models. There, however, it is often not 
immediately clear which assumptions and guarantees are encoded by the oracle 
queries and winning conditions of games. For instance, which of the a priori dif- 
ferent types of IND-CPA security described in Sect. [4.2] captures confidentiality 
“best” (and why)? This lack of semantics abets the prevalence of security notions 
that do not capture the security requirements exactly (see Sect. [4.2] and 4.3). 
A further issue with game-based notions is that seemingly innocent changes 
may have a significant impact on the security guarantees. The security notion 
indistinguishability from random bits was introduced in [30] and is similar to 
IND-CPA. Yet, instead of an encryption of a random message, the game returns 
a uniformly random string of appropriate length. The way this length is chosen, 
however, is crucial: In the original definition, this is determined by a function of 
the length of the queried message. If this choice is changed (as done, for example, 
in [15]) to the length of an encryption of the queried message, this allows to 
leak information about the plaintext via the length of the ciphertext! A further 
example is the weakness of the INT-PTXT notion described in Sect. 
Moreover, several attack models in the definitions described in the litera- 
ture seem inappropriate for practical applications. One example is IND-CCA IP, 
where the receiver stops decrypting adversarially generated ciphertexts after the 
first message has been sent honestly. Also, certain terms such as NM-CPA are ac- 
tually misleading: An attack exploiting the malleability of an encryption scheme 
is necessarily mounted by injecting or replacing ciphertexts. A more appropriate 
correspondence for this type of notion is a CCA attack on a single-use channel. 


5 Conclusion 


We have defined and analyzed confidentiality and integrity notions for sym- 
metric encryption schemes using the paradigm of constructive cryptography. 
The resulting security definitions are composable and have clear semantics: The 
guarantees of a cryptographic protocol appear explicitly in the description of the 
constructed resource. We have shown how existing game-based notions can be 
translated into guarantees in this setting, which makes their semantics explicit. 
Additionally, this analysis has uncovered a weakness in the notion INT-PTXT, 
and it has shown that INT-CTXT and IND-CCA are artificially strict. 


° In the CCA1 game, the adversary looses access to the decryption oracle after the 
first call to the challenge oracle. This corresponds to the situation where the receiver 
only decrypts messages until the first message has been generated by the sender. 
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Abstract. Physical cryptographic devices inadvertently leak informa- 
tion through numerous side-channels. Such leakage is exploited by so- 
called side-channel attacks, which often allow for a complete security 
breache. A recent trend in cryptography is to propose formal models to 
incorporate leakage into the model and to construct schemes that are 
provably secure within them. 

We design a general compiler that transforms any cryptographic 
scheme, e.g., a block-cipher, into a functionally equivalent scheme which 
is resilient to any continual leakage provided that the following three re- 
quirements are satisfied: (i) in each observation the leakage is bounded, 
(ii) different parts of the computation leak independently, and (iii) the 
randomness that is used for certain operations comes from a simple (non- 
uniform) distribution. In contrast to earlier work on leakage resilient cir- 
cuit compilers, which relied on computational assumptions, our results 
are purely information-theoretic. In particular, we do not make use of 
public key encryption, which was required in all previous works. 


1 Introduction 


Leakage resilient cryptography attempts to incorporate side-channel information 
leakage into standard cryptographic models and to design new cryptographic 
schemes that provably withstand such leakages under reasonable physical as- 
sumptions. The “holy grail” in leakage-resilient cryptography is a generic method 
to provably protect any cryptographic computation against a broad, well-defined 
and realistic class of side-channel leakages. This fundamental question has first 
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been studied in the work of Ishai et al. who initiated the concept of 
leakage resilient circuit compilers. A circuit compiler takes a description of a 
(Boolean) circuit I as input and outputs a transformed (Boolean) circuit I" 
with the same functionality, but with resilience to certain well-defined classes 
of leakage. The authors consider a very specific type of leakage, namely, an ad- 
versary who learns the values of up to n € N internal wires in each execution 
of II’. Security is proven by a simulation based argument. More precisely, it is 
shown that any (computationally unbounded) adversary that learns the value of 
up to n internal wires in each execution of IJ’ has only a negligible advantage 
over an adversary that only views the inputs/outputs of the original circuit T. 

The result of Ishai et al. shows security for a very restricted class of leakages, 
namely, security is proven only against the specific attack of learning the val- 
ues of n wires. The question that motivates our work is whether, analogously 
to [ISW03], we can protect any computation against the much broader class of 
polynomial-time computable leakages. This question has been answered affirma- 
tively in the recent feasibility results of Juma and Vahlis and Goldwasser 
and Rothblum by making additionally use of the prominent “only com- 
putation leaks information” assumption [MR04]. The security of both compilers, 
however, relies on heavy cryptographic machinery by using public key encryption 
to “encrypt” the secret state and the whole computation of T|} 

At first sight, it may look natural to rely on some form of cryptographic en- 
cryption, if we want to achieve security against any polynomial-time computable 
leakage function. For instance, it is necessary to “encrypt” the secret state of I’, 
as already a single bit of information leaking about the original secret state 
makes simulation-based security impossible. Perhaps surprisingly, in this paper 
we show that cryptographically secure encryption schemes are not necessary to 
construct leakage resilient circuit compilers for polynomial-time computable leak- 
ages. More precisely, we show that even an unbounded adversary with continuous 
leakage access to IJ” only gains a negligible advantage over an adversary with 
only black-box access to I’. 

Similar to earlier work, we make certain restrictions on the leakage. We follow 
the work of Dziembowski and Pietrzak [DP08], and allow the leakage to be 
arbitrary as long as the following two restrictions are satisfied: 


1. Bounded leakage: the amount of leakage in each round is bounded to A 
bits (but overall can be arbitrary large). 

2. Independent leakage: the computation can be structured into sub compu- 
tations, where each part of the computations leaks independently (we define 
the term of a “sub computation” below). 


Formally, this is modeled by letting the adversary for each observation choose a 
leakage function f with range {0,1}4, and then giving her f(T) where 7 is all 
the data that has been accessed in the current sub-computation. In addition, we 
require access to a source of correlated randomness generated in a leak-free way 


1 More precisely, Juma and Vahlis require fully homomorphic encryption, while Gold- 
wasser and Rothblum use a variant of the BHHO encryption scheme. 
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—e.g., computed by a simple leak free component. We provide more details on 
our hardware assumptions below. 


On Independent Leakages. Variants of the assumption that different parts 
of the computation leak independently have been used in several works 
[Pic09|[KP10)(GR10\(GR10)/JV10). In its weakest form, the assumption says that 
the state is divided into two parts that leak independently. This type of assump- 
tion is used, e.g., in the work on leakage resilient stream ciphers [DP08}[Pic09}. 
Several stronger flavors have been used in the literature. For instance, in the cir- 
cuit compiler of Goldwasser and Rothblum the computation is structured 
into O(s) sub-computations, where s is the size of the original circuit. Of course, 
in practice leakage is a global phenomenon and assumptions that require a large 
number of independent computations is a strong assumption on the hardware. 
We would like to emphasize, however, hat many relevant global leakage func- 
tions can be computed from independent leakages. This is not only true for the 
prominent Hamming weight leakage, but more generally, for any affine leakage 
function. 


On the Relation between Leakage Granularity and the Amount of 
Leakage. We show a relation between the granularity level of the independent 
leakage assumption and the amount of leakage that can be tolerated per obser- 
vation. More precisely, in our basic setting we assume that the computation is 
structured into 2s parts that leak independently, where s is the number of gates 
in I (this is comparable to the model of [GR10}). Here, the amount of leakage 
can increase linearly with the size of the circuit. Alternatively, we may settle for 
weaker independency assumptions. That is, in the best case we may require only 
two sub-components that leak independently. Of course this comes at a price: the 
amount of leakage that is tolerated is independent of the circuit’s size. We notice 
that we can tolerate more leakage if we assume some strong form of memory 
erasures between sub-computations (cf. Section [6]for the details). 


On Leak-Free Components. Leak-free components are used by recent leakage 
resilient circuit compilers [GKRO8/FRRt10/JV1I0|GRI10). A leak-free component 
leaks from its outputs, but the leakage is oblivious to its internals. In this work, 
we use the leak-free component, O, that was recently introduced by Dziembowski 
and Faust [DF11]. This component outputs two random vectors A, B — F” (with 
F being a finite field and n being a statistical security parameter) such that their 
inner product is 0, i.e., >, Aj: Bi = 0. As discussed in [DF 11], O exhibits several 
properties that are beneficial for implementation. We refer the reader to 
for a more thorough discussion on the properties of O. 


1.1 Our Contributions 


We propose a general transformation (also called the “compiler”) that takes any 
circuit [ computing over finite fields F and transforms it into JJ’ in such a 
way that (1) the circuit 17’ computes the same function as I’, and (2) any 
(computationally unbounded) adversary that obtains continuous leakage from 
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II” gains only negligible advantage over an adversary with only black-box ac- 
cess to I’. We emphasize that in contrast to earlier works in similar leakage 
models [GR10}[JV10], we do not use public key encryption to achieve leakage 
resilience. This makes our results significantly more efficient. 

Our construction is secure in the continuous leakage setting with adaptive 
queries. That is, we assume that the circuit I’ can be initialized (during a trusted 
step-up phase) with some secret state, and is then queried by an adversary S on 
adaptively chosen inputs X!,...,X°. For each i let Yt := I (Xt, state) be the 
outcome of the ith query. To define security, we consider an adversary A that 
attacks JJ’ and gets the same information (i.e., pairs (X1, Y1), ..., (X£, Y į) for 
Xs chosen by him) plus the leakage from each computation. Informally, the 
security definition requires that for every such (computationally unbounded) 
adversary A, there exists S with only black-box access to I’ that produces the 
same output as A. The formal definition is given in Section For simplicity, 
in the formal model we consider only the case where the adversary is allowed 
to observe the computation once. For readers familiar with the work on leakage 
resilient circuits this is the case of stateless circuits. We briefly 
discuss how to extend our result to the continuous leakage setting in Section [6] 

We emphasize that the running time of our simulator S is polynomial in the 
running time of A. This is necessary to protect circuits I’, which hide the secret 
key only computationally — which is the case for most prominent cryptographic 
schemes. This is in contrast to the recent work of Dziembowski and Faust 
that consider efficient transformations for cryptographic schemes which hide the 
secret key information theoretically (e.g., Okamoto signatures or Cramer-Shoup 
encryption). 


1.2 Comparison to Related Work 


An extension of the circuit compiler of Ishai et al. (mentioned above) 
was proposed by Faust et al. [ERRF10]. The authors use similar techniques 
as [ISW03| based on secret sharing but give a significantly improved security 
analysis considering computationally weak (e.g., ACO) and noisy leakages. Simi- 
lar to our work, the results of work in the information theoretic 
setting. The leak-free components that are used in earlier works are similar in 
spirit to the component used in our work. In [FRRĦ10], the leak-free compo- 
nent outputs an n-bit string with parity 0, while in the works of Juma and 
Vahlis and Goldwasser and Rothblum it outputs ciphertexts that 
encrypt 0 using the underlying public-key encryption scheme. Except for the 
work of Juma and Vahlis all leakage resilient circuit compilers (including ours) 
require at least one leak-free component for each gate in the original circuit I’. 

We finally remark that our results do not imply the recent results of Dziem- 
bowski and Faust [DF1I]. More precisely, although we use the same trusted source 
O as [DFL], the schemes of cannot be obtained by using our circuit com- 
piler. The reason for this are twofold: first, the protocols of only use the 
leak-free component for the refreshing of the secret key, while our protocols need 
to use O for each gate of the original circuit. Second, their implementation of 
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standard cryptographic schemes are significantly more efficient: while we work on 
the gate level and blow-up the circuit’s size by O(n*), Dziembowski and Faust di- 
rectly exploit homomorphic properties of cryptographic schemes and increase the 
size only by a factor of O(n). Unfortunately, however, these techniques are limited 
only to certain schemes such as the Okamoto identification and the Cramer-Shoup 
encryption. 


2 Preliminaries 


For a set S we denote by X + S the process of drawing X uniformly from S. 
A vector V is a row vector, and we denote by VT its transposition. We let F be 
a finite field and for m,n € N, let F™*” denote the set of m x n-matrices over 
F. For a matrix M € F™*” and an m bit vector V € F™ we denote by V - M 
the n-element vector that results from matrix multiplication of V and M. For a 
natural number n let (0)” = (0,...,0). We use V[#] to denote the ith element 
of a vector V and V{i,...,7] to denote the elements i,i+1,...,7 of V. For two 
vectors V € F™,W € F” we denote by V||W its concatenation and by V & W 
we will mean a vector in F’”’” defined as 


VOW :=(ViWi,...,ViWm, VaWi,.-.,VaWms <0., Va Wais. <., VnWm). (1) 


Finally, let (V, W} denote the inner product of V and W. We will use the fact 
that the inner product is linear, i.e. (a - V + V',W) =a- (V, W) + (V', W). 
The “£” symbol denotes the equality of two distributions. For two random 
variables Xo, Xı over XV we define the statistical distance between X and Y as 
A(X; Y) = J ex 1/2| Pr[Xo = a] — Pr[ X: = a]. 


2.1 Leakage Model 


To formally model leakage, we follow Dziembowski and Faust and only 
recall some important details here. We model independent leakage from mem- 
ory parts in form of a leakage game, where the adversary can adaptively learn 
information from the memory parts. More precisely, for some c,?, € N let 
M,,...,Me € {0,1}* denote the contents of the memory parts, then we define a 
A-leakage game played between an adaptive adversary A, called a A-limited leak- 
age adversary, and a leakage oracle 2(M,,..., Me) as follows. For some m €N, 
the adversary A can adaptively issue a sequence {(2;, f;)}7%, of requests to the 
oracle 2(M,,..., Me), where x; € {1,...,£} and fi : {0,1}° — {0,1}* with 
Ai < à. To each such a query the oracle replies with f;(M,,) and we say that 
in this case the adversary A retrieved the value f;(Mz,) from M,,. The only 
restriction is that in total the adversary does not retrieve more than A bits 
from each memory part. In the following, let (A = (M1,...,Me)) be the out- 
put of A at the end of this game. Without loss of generality, we assume that 
(A€ (Mi,...,Me)) = (fi(Me,),---, frm(Mz,, ))- 


LEAKAGE FROM COMPUTATION. We model the computation that is carried out 
on a device as a ¢-party protocol IT = (P,,..., Pr), which is executed between 
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the parties (P,,..., Pe) and an adversary is allowed to obtain partial information 
(the leakage) from the internal state of the players. Initially, some parties may 
hold inputs, and we denote by S; the input of P;. The execution of J with initial 
inputs S1,...,.52, denoted by 17(S1,..., $2), is structured into sub-computations. 
In each sub-computation one player is active and sends messages to the other 
players. These messages can depend on his input (i.e., his initial state), his local 
randomness, and the messages that he received in earlier rounds. At the end of 
the protocol’s execution, the players P,,...,P, output values S/,...,5%, resp. 
(some of these values may be empty). For each player P;, we denote the local 
randomness that is used by P; during the execution of I and all the messages 
that are received or sent (including the messages from the user of the protocol) 
by view;. We assume that after the protocol terminates, the adversary A plays 
a A-leakage game against the leakage oracle Q(view;,..., views). We will use 
the following convention in order to simplify the exposition: while describing a 
protocol we will explicitly describe the view of each player, sometimes omitting 
redundant variables. For instance, if the view contains variables X,Y, Z, such 
that always Z = X @ Y, then we will omit Z, as it can be calculated by the 
leakage function from X and Y. 


2.2 Leakage-Resilient Storage 


Davi et al. recently introduced the notion of leakage-resilient storage 
(LRS) & = (Encode, Decode). An LRS allows to store a secret in an “encoded 
form” such that even given leakage from the encoding no adversary learns in- 
formation about the encoded values. One of the constructions that the authors 
propose uses two source extractors and can be shown to be secure in the in- 
dependent leakage model. More precisely, an LRS for the independent leakage 
model is defined for message space M and encoding space £ x R as follows: 


— Encode: M — L x R is a probabilistic, efficiently computable function and 
— Decode: L x R — M is a deterministic, efficiently computable function such 
that for every S € M we have Decode(Encode(S')) = S. 


An LRS @ is said to be (A, €)-secure, if for any S, S” € M and any A-limited adver- 
sary A, we have A(A € (L, R); A = (L’, R’)) < e, where (L, R) — Encode(S) 
and (L’, R’) — Encode(.$”), for any two secrets S,S’ € M. In this paper, we 
consider a leakage-resilient storage scheme ý that allows to efficiently store 
elements from M = F. It is a variant of a scheme proposed in and 
based on the inner-product extractor. For some security parameter n € N, 
Pr := (Encodes, Decode; ) is defined as follows: 


— Encode; (S): 
1. Sample (L22,...,n], R[2,-..,n]) — (F°-?)’. 
2. Set L[1] — F \ {0} and R[1] := Lit]! - (S — (L[2,...,n], R[2,...,n]))) 
Output (L, R). 

— Decode (L, R): Output (L, R). 
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The property that L|1] # 0 will be useful in the “generalized multiplication” 
protocol (cf. Section [4.2). It is easy to see that p is correct, i.e.: 


Decodep (Encodep(S)) = S. 


Security is shown in the following lemma whose proof appears in the full version 
of this paper. 


Lemma 1. Letn €N and let F such that |F| = R(n). For any 1/2>6>0,y> 
0 the LRS Pg as defined above is (A,€)-secure, with A = (1/2 — ô)nlog |F| — 
log y~'—1 and e = 2m(|F|32 + |F| ^). 

We instantiate Lemma [I] with concrete parameters in the next corollary. 
Corollary 1. Suppose |F| = Q(n). Then, LRS Pg is (0.49-logs |F”|—1, negl(n))- 


secure, for some negligible function negl. 


3 An Informal Description of the Protocol 


In this section we describe informally our circuit compiler that is based on the 
LRS scheme z. Our starting point is the result of where a protocol 
Refresh; is proposed to refresh secrets encoded with %. Refresh» is run between 
two parties P, and Pp, which initially hold L and R in F”. At the end of the 
protocol, P, holds L’ and PR holds R’ such that (L, R} = (L’, R’). The protocol 
can be repeated continuously to refresh the encoding and satisfies the follow- 
ing security requirement: even given continuous leakage independently from the 
parties P, and Pr no adversary can learn the encoded secret (L, R}. 

In order to create a general circuit compiler in the independent leakage model, 
all we need is to perform in a leakage-resilient way arithmetic operations on the 
encoded secrets using the LRS $f. This is similar to the methods used in the 
MPC literature: first, the secret is secret-shared between the parties (in our case: 
“encoded”), and then the operations are performed “gate-by-gate” in a secure way. 
At the end the outputs of the computation are reconstructed in the following 
way: one of the players, PL, say, sends his share L’ of the output to Pr and PR 
computes Decode; (L, R). We us a similar approach in this paper. 

To illustrate this approach, consider the simple case of a circuit that multiplies 
a constant a with a secret S encoded as (L, R). If L is held by P| and R is 
held by Pr, then one of the players, PL, say, multiplies his vector by a (as 
(a: L, R) = a- (L, R)). Also, addition of a constant c to S is simple: the player 
P, sends x = L[1] to Pr (for simplicity assume that L[1] # 0), and then Pp 
sets R! = R+ (a7! -c,0,...,0) and P| sets L’ = L. We notice that (L’, R’) 
was computed from (L, R) just by sending one field element from P, to Pr, 
and in particular it did not involve computing (L, R). We call this protocol 
AddConstp (a, (L, R)). 

The only ingredient that is missing for computing arbitrary functionalities is 
a protocol for leakage-resilient multiplication of two encoded secrets. The con- 
struction of such a protocol is the main contribution of this paper (for techni- 
cal reasons, we construct in Section [4.2]a protocol for a slightly more general 
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functionality, which we call “generalized multiplication”). Suppose we have two 
secrets S° € F and S! € F encoded as (L°, R?) and (£1, R!), respectively. 
Suppose further that player P| holds (L°, L!) and player PR holds (R°, R'). 
Their goal is to compute L”,R” € F” in a leakage-resilient way such that 
(L", R") = S and L” is held by P|, while R” is held by Pr. Our first ob- 
servation is that (L° @ L1, R? @ Rt) = (L°, R?) - (£1, R!) = §°. St, which 
follows from simple linear algebra. Hence, (L° & L1, R? & Rt) encodes the secret 
5°. St in the on scheme. Note that this protocol, so far, is non-interactive 
so it is clearly secure. The disadvantage of this protocol is that the length of 
the encoding grows exponentially with the depth of I’. Therefore, we need a 
method of reducing the length of this encoding. This can be done in the fol- 
lowing way. First, the players refresh the (L? @ L', R? @ R!) encoding with the 
Refresh” protocol. Let (L’, R’) € F”? x F” be the result of this refreshing. 
Then, the players reconstruct in clear the secret encoded by the final n(n — 1) 
elements of L’ and R’. More precisely, the player P, sends L'in + 1,...,n7] 
to Pr, and Pr computes d = (L’'[n + 1,...,n7], R’[n + 1,...,n?]). We now 
clearly have that S°-S! = (L’,R’) = (L'[1,...,n], R’[1,...,n]) + d. Hence, 
(L'[1,...,n], R’[1,...,n]}) encodes $°-S+ minus d. Since d can be published by Pg 
we can now use the protocol AddConstę (d, (L’/[1,...,n], R’[1,...,n])), and add 
a constant d to (L’[1,...,n], R'[1,...,n]). The output (L”, R”) of the protocol 
is the result of this operation. Observe that the use of the refreshing protocol is 
crucial, as (L? & L')[n+1,...,n?] gives almost complete information about L° 
and Lt. 


4 The Ingredients 


In this section, we describe the two main ingredients of our compiler construction: 
the “refreshing” protocol for P? (cf. Section ALI) and the “generalized multipli- 
cation” protocol (cf. Section 4.2). The latter protocol will use the former as a 
sub-routine. In the full version of this paper, we show that these two components 
satisfy a simple security property called reconstructibility. This notion was intro- 
duced recently in (FRR*10} and essentially says that the view of the parties in a 
protocol can be efficiently reconstructed from just knowing the encoded inputs 
and outputs. For our setting, we modify this notion and define reconstruction 
as a protocol run between players P, and PR, where the efficiency criteria of the 
reconstructor is the amount of information exchanged between the parties. For 
instance, for the generalized multiplication the reconstructor protocol is run be- 
tween P, with input (L°, £1, L”) and Pg with input (R°, Rt, R”) and computes 
viewr and viewr with only one field element of communication. 


4.1 Leakage-Resilient Refreshing of LRS 


In this section, we propose a simple variant of the refreshing protocol proposed 
in [DF11] (cf. Section 3) for the LRS p. As described in the introduction, 
we assume that the players have access to a leak-free component that samples 
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uniformly at random pairs of orthogonal vectors. Technically, we will assume that 
we have an oracle O’ that samples a uniformly random vector ((A, A), (B, B)) € 
(F")*, subject to the constraint that the following three conditions hold: 


1. (A, B) + (A, B) =0, 
2. A#(0)", and 
3. BF (0)". 


Note that this oracle is different from the oracle O described in the introduction 
(and used earlier in [DF11]) that simply samples pairs (A,B) of orthogonal 
vectors. It is easy to see, however, that this “new” oracle O’ can be “simulated” 
by the players that have access to O that samples pairs (C, D) of orthogonal 
vectors of length 2n each. First, observe that C € F?” can be interpreted as a 
pair (A, A) € (F”)? (where A||A = C), and in the same way D € F?” can be 
interpreted as a pair (B, B) € (F")? (where B||B = D). By the basic properties 
of the inner product we get that (A, B)+(A, B) = (C, D} = 0. Hence, Condition 
[I] is satisfied. Conditions P] and B] can simply verified by players P, and Pp 
respectively. If one these conditions is not met, then the players sample a fresh 
(C, D) from O. Obviously, this happens with a negligible probability 2 - 27”Fl 
only, so it has almost no impact on the efficiency of the protocol. 

The reason for introducing Conditions] and[Jis to make the exposition sim- 
pler as it avoids dealing with the events that happen with negligible probability 
(cf. the caption of Figure[I). The reason for having Condition [is slightly more 
subtle and will be explained below. 

The refreshing scheme is presented in Figure [I] The main idea behind this 
protocol is as follows (for this high-level overview ignore Step [4] as it anyway 
influences the execution only with negligible probability). Denote a := (A, B)(= 
—(A, B)). The Steps P]and Blare needed to refresh the share of PR. This is done 
by generating, with the “help” of (A, B) (coming from O’) a vector X such that 


(L,X) =a. (2) 


Eq. comes from simple linear algebra: (L, X) = (L,B- MT) =(L-M,B)= 
(A, B) = a. Then, vector X is added to the share of PR by setting (in Step B) 
R’ := R+X. Hence we get (L, R’) = (L, R}+ (L, X) = (L, R)+a. Symmetrically, 
in Steps [5] and [6] the players refresh the share of PL, by first generating Y such 
that (Y, R) = —a, and then setting L’ = L + Y. By similar reasoning as before, 
we get (L’,R’) = (L,R’) — a, which, in turn is equal to (L, R}. Hence, the 
refreshing is correct. 

The security proof of this refreshing scheme appears in the full version of this 
paper. The key property that is used there is that X is generated “obliviously” 
from P|, and Y is generated “obviously” from PR. In other words: PL gets no 
information on X except that (L, X} = —(Y,R), and a symmetric fact holds 
for Pr. For more intuition behind this protocol the reader may consult 
(Sect. 3), where a similar refreshing scheme is constructed. The main difference 
is that the protocol presented here refreshes the shares “completely”, i.e. the new 
encoding (L’, R’) is completely independent from (L, R) (except that is encodes 
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the same secret), while in this was not the case. More precisely, in the 
refreshing of A, A, B, and B were such that (A, B) = (A, B) = 0, which 
implied that in particular (L, R’— R} and (L’— L, R’) were equal to 0 (and hence 
(L’, R’) was not independent from (L, R)). In our protocol it is not the case since 
(A, B) = a and (A, B) = —a (where a is random) and hence (L, R’ — R) and 
(L’ — L, R) are random. This “independence” of encodings after refreshing is a 
very useful property for showing security of composition of larger circuits. 


Protocol (L’, R’) — Refresh? ((L, R)): 
Input (L, R): Le (F\{0}) x F”~' is given to P, and R € F” is given to Pp. 


1. Let (A, A, B, B) — O’ and give (A, A) to P. and (B, B) to Pp. 


Refreshing the share of Pr: 


. Player P_ generates a random non-singular matrix M € F”*” such that 
L- M =A and sends it to Pr. 
. Player PR sets X := B- M” and R' := R+ X. 


Refreshing the share of P: 


. If R' = (0,...,0) then Pr sends a message u = “zero” to P,. Player P, sets 
L' — (F\ {0}) x F”"!. The players output (L’, R’) and finish this round 
of refreshing. Otherwise the player Pr sends a message u = “nonzero” to 
P, and they execute the following: 

. Player Pr generates a random non-singular matrix M € F"*” such that 
M - R! = B and sends it to P.. 

. Player P, sets Y := A+ MT and L’ :=L+Y. 

. If L’[1] = 0 then restart the procedure of refreshing the share of P,, i.e. 
go to Step Æ 

Output: The players output (L’, R’). 
Views: The view view. of player P, is (L, A, M, A, M, u) and the view viewr 
of player Pp is (R, B, M, B, M, p). 


Fig. 1. Protocol Refresh?. Oracle O’ samples random vectors (A, A,B, B) € (F”)* 
such that (1) (A,B) = —(A,B) and (2) A ¥ (0)”, and (3) B # (0)". Note that the 
conditions (2) and (3) are needed as otherwise it might be impossible to find matrices 
M and M in Steps Ø] and 5] respectively. It is easy to see that L[1] has a uniform 
distribution over F, and hence restarting part of the protocol in Step [7]happens with 
probability |F|. Therefore if F is large then this probability is negligible. In Sect. [6] 
we show how to change our protocol so that the probability of restarting is negligible 
even if |F| is small (e.g. constant). 


4.2 Leakage-Resilient Computation of Generalized Multiplication 


We now present a leakage-resilient protocol for computing a “generalized mul- 
tiplication” function f(S°,S!1,c) = c — S° - S1, where the values S° € F and 
S' € F are encoded by an LRS #2 = (Encodep, Decodes) (let (L°, R?) and 
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(Lt, R!) be the respective encodings), and c € F is a constant. The result 
f(S°, S',c) of the computation is encoded by (L”, R”). This construction has 
already been discussed informally in Section [8| The formal description appears 
in Figure B] It uses the Refresh” protocol as a sub-routine, and hence also re- 
lies on the special free oracle O’. It is easy to see that this protocol is correct. 
More formally, for any inputs L°, R°, Lt, R! € F” and c € F we have that 
Decodep (L”, R”) = c — Decodep(L°, R°) - Decode#(Z', R1), where (L”, R”) — 
Mult? ((L°, R°), (L+, R'),c). The security properties of this protocol are defined 
and proven in the full version of this paper, where we show that the multiplica- 
tion protocol is reconstructible with low communication between the parties PL 
and Pr. 


Protocol (L”, R”) — Mult? ((L°, R°), (L+, R’), c): 


Input (L, R): L°, L € (F\ {0}) x F”~* are given to P, and R°, R' € F” are 
given to Pr. The field element c € F is given to both players. 


1. The players P, and Pp run the Refresh” (L? ® L', R? @ R!) protocol. Let 
L’ and R’ be their respective outputs, and let view, and viewg be their 
respective views. 

. Player P, sends x := L’[1] and the last n(n — 1) bits of L’ (i.e. the vector 
L'[n+1,...,n7]) to Pr. Player Pr computes d := (L’/[n+1,...,n?], R'[n+ 
1,...,n7]) and sets R” := —R’[1,...,n] + (a7~'(c—d),0,...,0). 

3. Player P, sets L” := L'[1,... n]. 


Output: The players output (L”, R”). 
Views: The view view. of player P, is (L°, L}, L’, L,c, view,) and the view 
viewr of player Pr is (R°, R+, R', R’,c,d,x,L'In+1,...,n?], viewr). 


Fig. 2. Protocol Mult?. Note that computing x~' is possible since in our LRS the first 


bit of L is never equal to 0. This is actually precisely the reason why this restriction 
was introduced. 


5 The Compiler 


5.1 Arithmetic Circuits 


Before describing our general circuit compiler, we must define how to model 
arithmetic circuits over finite fields F as these are used to describe the original 
circuits. To keep the exposition simple, we consider circuits consisting only of 4 
types of gates. The first two types are: the public-input gates that will be used 
by the user, or the adversary, to provide the input X to the circuit, and the 
private-input gates that will be used to provide the secret input state (e.g., the 
cryptographic key) to the scheme. The third type of a gate is the multiplication 
gate (a,b,c). This gate takes as input the values A € F and B € F of two other 
gates (indicated by a and b, resp.) and a constant c € F, and produces a result 
c— AB. Note that in particular the “negated and” function over bits can be 
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expressed by such a gate, as AA B = 1 — AB, for A,B € {0,1}. Finally, we 
also have the output gates. Each output gate takes as input a value from of a 
gate of a previous type and outputs it. Since it is well-known that a NAND gate 
is complete the above suffices to describe any functionality. Formally, a circutt 
over a field F is a sequence I’ = (y1,...,%), where each y; is called a gate. The 
set of gates is divided into the following groups. 


public-input gates: 7,,...,%m — each such a gate is equal to a special symbol 
pub and takes the inputs provided by the user. 

private-input gates: Ym+41,---;Y%m+k — each such a gate is equal to a special 
symbol priv and represents the memory containing the secret state, 

multiplication gates: Ym+k+1,---,Yt-u — each such a gate y; (i € [m + k + 
1,t — uļ) has a form (a,b,c), where a,b € {1,..., — 1} and c € F. We say 
that the outputs of the gates ya and y are inputs for the gate Ņi, 

output gates: y~u+41,.--,7 — each such a gate 7; is equal to some j, where 
j€{l,...,t—u}. We say that yj is an input for the gate yi. 


For technical reasons, we also assume that the circuit’s fan-out is at most 2, more 
precisely: each y; is an input for at most 2 other gates. This can be clearly done 
without loss of generality. The computation Comp(T, X, state) of such a circuit 
on input (X, state) = ((a',...,2™),(s1,...,8")) is a sequence (€1,...,&*) of 
values on the outputs of circuit gates (one may think of this as the output wires 
of the gates), defined by the following procedure: 


— For ¿i = 1 to t do: 
1. if y; = pub (“public-input gate”) then set £ := zt, 
2. if yi = priv (“private-input gate”) then set €¢ := s*~™, 
3. if y; = (a,b,c) (“multiplication gate”) then set ¿£t = c — €7€°. 
4. if yi = j (“output gate”) then set & = &, 


The output of the computation is equal to (€*~“*"!,...,€) and will be denoted 
by I'(X, state). 


5.2 Protocols Computing Circuits 


Recall the definition of a protocol from Sect. [2-1] In this section we consider a 
special type of such protocols, that we call LRS-protocols. Each such a protocol 
IIe is parameterized by an LRS ® = (Encode: M — L x R, Decode : L x 
R — M) (we will say that I works over &). It consists of 2t parties P = 
{P}, ..., Pt, Pg,..., P$}. The parties are divided into following groups: 


“public-input parties”: P',...,P!”, Px,..., Px" — each Pi takes no input 
and each Pi takes as input z’ € F, 

“private-input parties”: pee. tala P Po _ a na — each Pý takes 
as input L’ € £, and each Pg takes as input R’ € R, 

“multiplication parties”: PPETI |., , Pt“, R™+*+1,.., Pk” — they have 


no inputs or outputs, 
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“output parties”: P/~"*!,..., Pt, R'-“t!,..., Pt — each Pi produces an out- 
put y’ € M, and the P}’s produce no output. 


The LRS-protocols will be analyzed only under the assumption that for 7 = 
k+1,...,m we have that (Lt, RÌ) — Encode(z*) for some gt. More precisely 
for X = (z1,...,2”) € F™ and state = (s1,...,s*) € F? consider the following 
experiment. 


Experiment ExpExec(I/e, X, state): 


1. For each i = 1,...,m give x‘ to Pi. 

2. For each i = 1,...,4 sample (L™+*, R™+*) — (st). Give L™+* to PMT 
and A to PPT, 

3. Run the protocol Je with the inputs for the players as described in the 
previous steps. 

4. For i=1,...,¢ let view; be the view of P}, and let viewg be the view of På 
in the above execution. 
Denote View(ITe, (X, state)) := ((viewL, viewr),..., (view, viewR)). 

5. Let Exec(ITg, (X, state)) be the vector containing the outputs of the parties 
Py-"*1 PŁ in the above execution. 


5.3 The Security Definition 


We now present the main security definition of this paper. As mentioned in the 
introduction, in this definition we consider only the non-adaptive security. In 
Sect. [6] we show how this definition can be extended to adaptive settings. Let 
I be a circuit with m public-input gates, k private-input gates and u output 
gates. Let Ip be an LRS-protocol with 2m public-input parties, 2k private- 
input parties and 2u output parties. We say that the Ha protocol (A, €)-securely 
computes T if: 


— Ie computes I i.e.: for every (X, state) € F! x F™ we have that 
Exec(ITg, (X, state)) = T (X, state), 


and 

— for every A-limited adversary A there exists a simulator S, running in time 
polynomial in the running time of A, that for every (X, state) € F* x F”, 
on input (X, T (X, state)) produces a variable S(X, T (X, state)) such that 


A((S(X, P(X, state)) ; (A = View(s, (X, T (X, state))))<e. (8) 


Note that state is not given directly to the simulator. The only variables that 
he gets are: the public input X and the output Y = I (X, state). Therefore, 
intuitively, the only information that he gets about state comes from (X,Y). 
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5.4 The Construction 


We are now ready to present our construction of the circuit compiler. Our com- 
piler takes an arithmetic circuit l’ and a parameter n € N and produces an LRS 
protocol IT on over 2. To simplify the notation we will write M} instead of JZ, bn: 


The protocol JTZ is depicted on Fig. B] 


Protocol (z'~"*",...,2°) — (IM (a#1,..., 2, (L+, R*),...,(L*, R*))): 


Input (geat LR cee (DAR) ie Give each zê € F to P$, each 
L’ € F” to PY and each R’ € F” to PPT. 


1. For i = 1,...,m player P$ computes (L‘, R’) — Encode? (x*) and sends 
L’ to Pt. The view view, of Pt is L’ and the view viewer of Pr is (L’, R’). 
2. t1,...,m+k the view view, of Pi is L’ and the view viewr of 


,---,£— u let (a,b,c) be such that yi = (a, b,c) 
Player P* sends L° to Pi. 
Player PÌ sends R° to Pi. 
Player P? sends L? to Pi. 
Player P sends R? to P$. 
Players P and Pi execute the Mult” ((L°, R°), (L°, R°),c) protocol. 
Let L’ and R’ be the respective outputs of the players at the end of 
this protocol, and let view} and views be their respective views. 

. For i =t—ut+l1,...,t let j be such that yi = j. 
Player PÍ sends L to PÅ. 
Player P? sends RÍ to P$. 
The players P’ and P execute the Refresh” (L7, R’) protocol. Let L’ 
and R’ be the respective outputs of the players at the end of this 
protocol, and let view} and view be their respective views. 
Player Pf sends L’ to P$. Player P$ computes zf := Decode (L*, R?) 
and outputs it. The vê of PË is view; and the view view, of Pe is 
(viewk, L’). 


Fig. 3. The IZ protocol 


We now have the following theorem. Its proof is based on the hybrid argument 
and appears in the full version of this paper. 


Theorem 1. Assume that for some n the LRS (Encodes,Decodes) is (A, €)- 
secure for some À and e. Then for any T the HE protocol (X/3 — logs |F|, te)- 
securely computes I’. 


The following is an example of the application of Thm. [I] for a concrete LRS. 


Corollary 2. Suppose |F| = R(n). Then for any I the HE protocol (0.16 - 
log, |F”| — 1 — log, |F| , negl(n))-securely computes I’, for some negligible n. 
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6 Extensions 


The model in Sect. [5] was intentionally kept simple in order to make the proof as 
easy as possible, and to satisfy the page limit. In this section we present several 
generalizations and extensions of this model. The formal security definitions and 
proofs will be presented in the extended version of this paper. 


ADAPTIVE SECURITY. Most of the cryptographic security definitions assume that 
the adversary is adaptive, meaning that he can interact with the cryptographic 
device in rounds, and his queries in the ith round may depend on the answers 
that he got in rounds 1,...,i — 1. Our model from Sect. B] obviously does not 
cover this scenario. We now briefly argue how to extend the model and the 
protocol to cover also the adaptive security. In the adaptive model one assumes 
that the circuit T is initialized with some secret state € F* and it can be queried 
adaptively on several inputs X!,...,X* (where £ is the number of rounds). To 
each such a query the circuit responds with Yt := r (Xt, state). The input X’ 
is placed on the “private input gates” at the beginning of each round, and the 
output Y* appears on the “output gates”. 

The protocol IJ’ that “computes I” consists of 2t parties, whose role is ex- 
actly like in the protocol in Sect. [5] In particular: the “private input parties” are 
initialized with an encoding of state, the “public input parties” in the ith round 
take X’ as input, and the output Y* is produced by the “output parties”. After 
the end of each round the memory of all the parties (except the “private-input 
parties” that hold the encoding of state) gets erased. The adversary A can adap- 
tively choose the X*’s and leak at most \ bits from each party in each round of 
the computation of IJ’ on input X*. The security definition assumes that for 
each round the simulator S gets a pairs {(X*, Y*)}£_, and his goal is to produce 
the output that is statistically close to the output of A. 

The implementation of IJ” is similar to the implementation of IJ” from Sect. 
B] In particular, the protocols for the parties in a single round are the same as 
before. The only change is that, since state does not change between the rounds, 
the “private input parties” need to refresh the encodings that they hold. This can 
be done easily with the Refresh? protocol from Sect. [ZI] each pair (Pt, P$) of 
“private input parties” applies, at the end of each round, the refreshing protocol 
to their encoding (L’R'), setting (L’, R’) := Refresh#(L', Rt). The security proof 
goes along the same lines as the proof of Thm. [I] It will be provided in the 
extended version of this paper. 


MORE GENERAL CIRCUITS. The circuits that we consider in Sect. B]have a very 
restricted form in order to make the proof of Thm. [as simple as possible. We 
now argue how some of these restrictions can be avoided. First, observe that we 
can consider circuits with fan-out q > 2. The only price to pay is that the leakage 
bound in the statement of Thm. [I]changes from “/3 — |F|” to “A/(q+1) — |F|”. 
This is because now each (L*, R’) is given to at most q + 1 parties (not just 3 
parties as before). 

For some applications it may also be useful to have a separate procedure for 
adding values in a leakage resilient way. First, observe that adding a publicly- 
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known constant c to an encoded secret can be done easily, as depicted on Fig. 
[4] (protocol AddConst;). In fact, this protocol has already been described in 
Sect. B] used (implicitly) in protocol Mults (cf. Fig. 2] Step P). The protocol 
computing the sum of two encoded secrets is presented on Fig. [4] Correctness of 
this protocols is a simple calculation. Because of the lack of space we the formal 
pro of their security properties is moved to the full version of this paper. 


Protocol (L’, R’) — AddConst#((L, R), c): 


Input (L, R): L e€ (F\ {0}) x F"“’ is given to P, and c € F is given to both 
players. 


1. Player P, sends x := Ly to Pp. 
2. Player Pr computes R := R + (er? c,0,...,0) 
3. The players execute the Refresh(L, R) procedure. Let (L’, R’) be the result. 


Output: The players output (L’, R’). 


Protocol (L’, R’) — Add} ((L°, R°), (L', R*)): 


Input (L, R): L°, L* € (F\ {0}) x F”~? are given to P, and R°, R' € F” are 
given to PR. 


. Player P, sets A := L° and C := L' — L°. 

. Player P, sets B := R? +R! and D := R'. 
Note that (A, B) + (C, D) = (L°, R°) + (L", R?). 

. Refresh (C, D) by (C’, D’) — Refreshp (C, D). 

. Compute c := Decode#(C’, D’). 
Note that this does not reveal any information about the inputs of the 
protocol, as (C’, D’) were “refreshed”. 

. Set (L’, R’) — AddConst# ((A, B), c) 


Output: The players output (L’, R’). 


Fig. 4. Protocols AddConstz and Addy 


DEALING WITH SMALL FIELDS. A natural field over which one could use our 
compiler is Z2. The problem here is that we assumed that in our encoding we 
have L[1] 4 0, and in the refreshing protocol, if this condition is not met, then 
part of the protocol is restarted (cf. Fig. I). Of course if F is small then this 
restarting can happen with a high probability. To avoid this problem one could 
change the underlying encoding scheme and require that some prefix of L of 
length a = w(log)p)(n)) (instead of just L[1]) is not equal to (0)*. In this way the 
probability of restarting is at most |F|~“ and hence it is negligible in n. The other 
change that is also needed in this case is that in Step 2Jof the Mult? protocol 
the player P, needs to send L[1,...,a] (instead of L[1]) to Pr. The price to pay 
for it is that the “— |F|” term in the leakage bound needs to be replaced by 2°. 


SMALLER NUMBER OF PARTIES. Recall that the number of parties in the pro- 
tocol II’ corresponds to the number of independent memory parts in the real 
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implementation of the scheme. In our model this number is linear (2t) in the 
number t of the gates of I’. This can be reduced in the following way. First, 
observe that some parties can be “reused” if we look at the computation of I" 
as a procedure that evaluates I’ gate-by-gate (cf. Sect. B.I). More precisely: if a 
given gate 7’ is not used anymore as an input to other gates, then the memory 
of the party P’ that corresponds to y' can be erased and P’ can be “assigned” 
to some other gate. Hence, we can reduce the number of parties to 2t’, where t’ 
is the width of I’. Here, by the “width” of a circuit we mean the minimal number 
of gates that needs to be kept in memory in order to compute I’. 

Observe also that we can actually decrease the number of memory parts even 
to two (call these parts: £ and R), by placing all Pi’s on £, and all P§’s on R. 
This, however, comes at a price: the leakage bound of £ and FR still needs to be 
a constant fraction of |n|, and hence it is a $ - |£] (where c is a constant and t 
is the width of T`), and the fraction $ gets very small for large t’. Hence it is 
mostly of a theoretical interest. 


Acknowledgments. The authors wish to thank Marcin Andrychowicz for point- 
ing out some errors in an earlier version of this paper. 
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Abstract. A leakage resilient encryption scheme is one which stays se- 
cure even against an attacker that obtains a bounded amount of side 
information on the secret key (say A bits of “leakage”). A fundamen- 
tal question is whether parallel repetition amplifies leakage resilience. 
Namely, if we secret share our message, and encrypt the shares under 
two independent keys, will the resulting scheme be resilient to 2A bits of 
leakage? 

Surprisingly, Lewko and Waters (FOCS 2010) showed that this is false. 
They gave an example of a public-key encryption scheme that is (CPA) 
resilient to À bits of leakage, and yet its 2-repetition is not resilient to 
even (1 + €)À bits of leakage. In their counter-example, the repeated 
schemes share secretly generated public parameters. 

In this work, we show that under a reasonable strengthening of the 
definition of leakage resilience (one that captures known proof techniques 
for achieving non-trivial leakage resilience), parallel repetition does in 
fact amplify leakage (for CPA security). In particular, if fresh public 
parameters are used for each copy of the Lewko-Waters scheme, then 
their negative result does not hold, and leakage is amplified by parallel 
repetition. 

More generally, given t schemes that are resilient to A1,...,A¢ bits 
of leakage, respectfully, we show that their direct product is resilient 
to $ (à: — 1) bits. We present our amplification theorem in a general 
framework that applies other cryptographic primitives as well. 


1 Introduction 


In recent years, motivated by a large variety of real-world physical attacks, there 
has been a major effort by the cryptographic community to construct schemes 
that are resilient to leakage from the secret keys. This successful line of work gave 
rise to many constructions of leakage-resilient cryptographic primitives, including 
stream ciphers [I] [19], signature schemes [15] [2], symmetric and public-key 
encryption schemes [9], as well as more complicated primitives. 

A natural question to ask is: Does parallel repetition amplify leakage? More 
concretely, suppose we are given a public-key encryption scheme € that remains 
secure even if bits about the secret key are leaked. Is it possible to amplify the 
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leakage-resilience to tà by taking t copies of €, and encrypting a message m by 
secret sharing it, and encrypting the 7“ share using E; (we denote the resulting 
scheme by €*)? Using an appropriate definition of parallel repetition, a similar 
question can be asked for signatures. 

Alwen, Dodis, and Wichs [3] and Alwen, Dodis, Naor, Segev, Walfish and 
Wichs [2] were able to amplify leakage resilience for particular schemes, using 
the specific properties of these schemes. They raised the fundamental question of 
whether leakage resilience can always be amplified by parallel repetition. They 
predicted that such a result will be hard or even impossible to prove under the 
known definitions. 

Recently, Lewko and Waters [I6] gave a striking negative result, giving an 
example of a public-key encryption scheme that is resilient to A bits of leakage 
but whose 2 repetition is not resilient to even (1 + )A bits. This was followed 
by a work of Jain and Pietrzak who presented a signature scheme where 
increasing the number of repetitions does not improve the leakage resilience at 
all. We elaborate on these negative results (and on how they go hand-in-hand 
with our positive results) in Section [L.2] 


1.1 Our Results 


We give positive results, by proving direct product theorems for leakage re- 
silience. In particular, we show that parallel repetition does amplify the leakage 
resilience (almost) as expected. 

The leakage model we consider is based on the “noisy leakage” model of Naor 
and Segev mag In this model, “legal” leakage functions are poly-size circuits 
that reduce the min-entropy of the secret key by at most À. A scheme is said 
to be -leakage resilient if every PPT adversary, that asks for a “legal” leakage 
function, breaks the scheme with only negligible probability. 

In this work, we consider a slightly relaxed leakage model. Instead of requiring 
the leakage function to always reduce the min-entropy of sk by at most A, we 
require that it should be hard to break the scheme on those leakage values that 
do reduce the min-entropy by at most A. In other words, we consider a point-wise 
definition: We say that a scheme is point-wise A-leakage resilient if for any PPT 
adversary, that asks for a poly-size leakage function L, the probability that both 
the leakage value y + L(pk, sk) reduces the min-entropy of sk by at most A, 
and that A(pk, y) breaks the scheme, is negligible. 

We believe that this leakage model is of independent interest, as it captures 
our “intent” better: As long as the secret key is left with enough min-entropy, 
the scheme is secure. Moreover, we note that all known constructions that are 
d-leakage resilient are also point-wise A-leakage resilient (including [18] 15491 [5]). 
We elaborate on this in Section [4] 

At first it may seem that point-wise leakage is equivalent to noisy leakage. 
However, the difficulty is that it may be hard to determine whether a leakage 


' While “entropic leakage” may be a more suitable name for this model, we stick with 
the terminology of for historic reasons. 
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value y + L(pk, sk) indeed reduces the min-entropy of sk by at most A. If this 
was efficiently determined, then indeed we would have a reduction between the 
two models. 

For technical reasons (see Section [I.3), we need to further relax our leakage 
model for our results to go through. We consider two (incomparable) relaxations. 


First Relaxation: Almost -Leakage. In the first relaxation, instead of requiring 
that sk has high min-entropy (given pk, y), we require that it is statistically close 
to a random variable with high min-entropy. A scheme that is secure in this 
model is said to be point-wise almost A-leakage resilient. We can prove a direct 
product theorem of any constant number of repetitions under this definition. 


Theorem 1. Letc €N be a constant, and for every i € |c], let E; be a point-wise 
almost r;-leakage-resilient public-key encryption scheme. Then, E1 x... X Ee is 
point-wise almost -leakage-resilient, where A = X`; (A; — 1). 


We refer the reader to Section [1.3] and Section [5] for more details. 


Second Relaxation: Leakage with Small Advice. In the second relaxation, we give 
the adversary an additional logarithmic (in the security parameter) number of 
bits of (possibly hard to compute) advice (quite surprisingly, we were unable to 
reduce this model to the point-wise A-leakage model). A scheme that is secure 
in this model is said to be point-wise r-leakage resilient with logarithmic advice. 
We can prove a direct product theorem of any polynomial number of repetitions 
under this definition. 

We note that it is not clear what it means to have t different leakage resilient 
schemes when ft is super constant, since there is a different number of schemes 
for each value of the security parameter. While one can come up with a proper 
definition (involving a generation algorithm that, for every value of the security 
parameter, gets 7 and implements €;), for the sake of clarity, we choose to state 
the theorem below only for parallel repetition of the same scheme. 


Theorem 2. Let t = t(k) be a polynomial in the security parameter. Let E be 
a public-key encryption scheme that is point-wise -leakage resilient with loga- 
rithmic advice. Then E* is point-wise t(\ — 1)-leakage resilient with logarithmic 
advice. 


We refer the reader to Section [1.3] for an overview of the proof, and to Section [6] 
for more details. 


The Relation Between our Models. Interestingly, we are not able to show that 
our relaxations are equivalent to one another, nor to show that they are implied 
by (plain) point-wise leakage resilience. This is surprising since in the bounded 
leakage modell a negligible change in the secret-key distribution, or adding 
a logarithmic number of hard to compute bits, does not change the model. 


2 Where the leakage function’s output is required to be bounded by A bits, as opposed 
to our requirement that the secret key has high residual entropy. 
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In a nutshell, the reason that this does not carry to our models, is that having 
high min-entropy is not an efficiently verifiable condition, and that statistical 
indistinguishability does not preserve min-entropy. 

We are able to show, however, that point-wise A-leakage resilience implies A- 
bounded leakage resilience (for the same value of A), and thus in particular, our 
relaxed models also imply bounded leakage resilience. We note that proving the 
above is somewhat nontrivial since we do not want to suffer a degradation in A. 
We refer the reader to Section B.3] for a formal presentation. 


Our Models and Current Proof Techniques. We show that for essentially all 
known schemes that are resilient to non-trivial leakage (i.e. super-logarithmic 
in the hardness of the underlying problem), amplification of leakage resilience 
via parallel repetition works. Specifically, this includes the Lewko- Waters coun- 
terexample, if the public parameters are chosen independently for each copy of 
the scheme. In order to do this, we identify a proof template that is used in all 
leakage resilience proofs, and show that this template is strong enough to prove 
point-wise leakage resilience, as well as our relaxed notions. See Section [4]for the 
full details. 

The Lewko-Waters counterexample uses its public parameters in a very par- 
ticular way that makes the argument not go through (see below). 


1.2 Prior Work 


As we claimed above, all known leakage resilient schemes are proved using the 
same proof template, and remain secure under our leakage models. This implies 
that parallel repetition should amplify security for all known schemes, which 
does not seem to coincide with the negative results of [16] [14]. We explain this 
alleged discrepancy below. 


The Lewko-Waters Countererample. Lewko and Waters [I6] construct a public 
key encryption scheme that is resilient to non-trivial length-bounded leakage, and 
prove that parallel repetition does not amplify its leakage resilience. However, 
the copies of their encryption scheme share public parameters: They are all using 
the same bilinear group. Their scheme, like all other schemes we are aware of, 
is (computationally indistinguishable from) point-wise leakage resilient and our 
theorems imply that parallel repetition does amplify its resilience to leakage. 
This is true so long as the public parameters are generated anew for each copy 
of the scheme: In our proof, we need to be able to sample key pairs for the 
scheme in question. Lewko and Waters use the public parameters in an extremely 
pathological (and clever!) way: The public parameters enable to generate keys for 
their actual scheme, but not for the computationally indistinguishable scheme 
where leakage resilience is actually proven. However, if we consider the generation 
of public parameters as a part of the key generation process, then new key pairs 
can always be generated, and parallel repetition works. 
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The Jain-Pietrzak Countererample. Jain and Pietrzak [14] give a negative result 
for signature schemes. They take any secure signature scheme and change it so 
that if the message to be signed belongs to a set H, then the signature algorithm 
simply outputs the entire secret key. The set H is computationally hard to hit 
(given only the public key), and thus the scheme remains secure. It follows that 
the scheme remains secure also given leakage of length O(log k), where k is the 
security parameter (more generally, if the underlying problem is 2% hard, then 
the scheme is resilient to ~ A bits of leakage). 

They prove that parallel repetition fails, by proving that if the scheme is 
repeated ¢ times, for some large enough t, then the leakage can in fact give 
enough information to find a message m that belongs to all the sets H;, and 
thus break security completely. They start with a result that relies on common 
public parameters: a common (seeded) hash function. Then, they suggest to 
remove this public parameter by replacing the seeded hash function with an 
explicit hash function, such as SHA256. However, this explicit hash function is 
also, in some sense, a joint non-uniform public parameter. 

This counterexample heavily relies on the “help” of the signing oracle when 
breaking the repeated scheme. The paper also presents a construction of a CCA 
encryption scheme, where they use the decryption oracle to break the parallel 
repetition system. 

In general, signature schemes are not covered by our amplification theorems. 
Our theorems (and proofs) only cover public key primitives where the challenger 
in the security game does not need to know the secret key (beyond providing 
the adversary with the leakage value). Our results do extend to schemes such as 
signature schemes or CCA encryption schemes, if they have the property that 
the challenger (i.e., the signing oracle or the decryption oracle) can be efficiently 
simulated given only the public key (or given very little information about the 
secret key), in a way that is computationally indistinguishable even given the 
leakage. For example, the signature scheme of Katz and Vaikuntanathan [I5] 
has this property, and thus its leakage resilience is amplified by parallel repeti- 
tion. Whether our techniques can be applied to other leakage resilient signature 
schemes (e.g. Æ [17] [13]) is an interesting question that we leave for further 
research. 


1.3 Overview of Our Techniques 


In what follows we give a high-level overview of our proofs. For the sake of 
simplicity, we focus on the case of two-fold parallel repetition. Let € be any 
d-leakage resilient encryption scheme. Our goal is to prove that the scheme £? 
is 2\-leakage resilient. For technical reasons, in our actual proof, we manage to 
show that £? is (2A — 1)-leakage resilient (in both our leakage models). 

Our proof is by reduction: Suppose there exists an adversary 5 for the parallel 
repetition scheme €? that leaks L(pk,, pkg, ski, sk2), where L reduces the min- 
entropy of (sk1,sk2) by at most 2A — 1. We construct an adversary A, that 
uses B to break security of €, and uses a leakage function L’ that reduces the 
min-entropy of the secret key by at most A. 
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Intuitively (and, as we will show, falsely), it does not seem too hard to show 
such a reduction. It only makes sense that when the pair (sk1, sk2) looses 2A bits 
of entropy, then at least one of the secret keys sk ,,skz “loses” at most A bits 
(otherwise the total loss should be more than 2A). Therefore the adversary A 
can sample a key pair by itself and “plant” it either as (pk, skı) or as (pks, sk2) 
(at random). Namely, A will sample a random i € {1,2}, and uniformly sample 
(pki, ski), the key pair of the scheme we actually wish to attack will play the role 
of (pk3_;, sk3_;). Upon receiving a leakage function L(-) from B, the adversary 
A will plug the known (pk, sk;) into the function and thus obtain L’ to be sent 
to the challenger. Upon receiving a response from the challenger, it is forwarded 
back to 6, which can then break security with noticeable probability. Notice that 
B’s view in the game is identical to its view in the repeated game against €?, 
and thus it still breaks the security with the same probability. The only worry 
is whether the function L’ only reduces the key entropy by the allowed amount, 
which is unfortunately not the case. Assume that L leaks some 2A bits on the 
bit-wise XOR sk; ® ska. Then when plugging in a known sk;, the resulting L’ 
still leaks 2A bits on sk3_;. 

To solve this problem, we must prevent A from knowing sk;. This is achieved 
by having the key pair (pk;, ski) sampled by the leakage function L’, rather than 
by A. Namely, L’ (pk, sk) is now defined as follows: First, sample (pk;, sk;) and 
set (pk3_,,8k3-i) = (pk,sk). Then run yL(pky, pks, ski, ska) to obtain the 
leakage value. Lastly, output (y, pk,, pko). Given the output of L’, the adversary 
A can forward the value y to B, that uses it to break the scheme, all without 
ever being exposed to the value of ski. 

This seems to give A the least amount of information possible, so we should 
hope that now we can prove that the entropy of sk is reduced by at most X. 
However, again, this is not true. Suppose that with probability 1/2, the leakage 
function L outputs 2A bits about sk; and with probability 1/2 it outputs 2A 
bits about sk. In this case, L indeed reduces the min-entropy of (ski, sk2) by 
2X, and yet for every i € {1,2} the leakage function L’(pk, sk) reduces the min- 
entropy of sk by essentially 2A as well, and thus is not a valid leakage function 
for the one shot game. 

This abnormality results, to some extent, from using min-entropy (as opposed 
to Shannon entropy) as our entropy measure: If L’(pk,sk) outputs both y = 
L(pk,, pkg, sk, sk) and sk3_;, then it would indeed leak at most À bits on sk 
(with probability 1/2). The fact that we have less information, namely sk; is 
not known, might actually decrease the min-entropy of the key. 

We arrive at a conflict: On one hand, knowing sk; is a problem, but on the 
other, not knowing it seems to also be a problem. We show that revealing sk; 
only in some cases, enables to prove parallel repetition. We use a simple lemma 
(Lemma [A.1), which essentially shows how to “split-up” the joint min-entropy 
of two random variables. More precisely, it says that there is a subset S of all 
possible secret keys sk,, such that for every sk; € S, the the random variable 
skə|skı has high min-entropy. Moreover, given the additional bit of information 
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that sk, ¢ S, causes sk; to have high min-entropy (which decreases as the size 
of S shrinks). 

We proceed by a specific analysis for each of our two relaxed models. For 
explanatory reasons, we first discuss leakage with advice (our second relaxation) 
and then go back to the almost leakage resilience model (our first relaxation). 


Point-Wise -Leakage with Advice. In this model, the adversary A will leak 
L' (pk, sk), which is a randomized leakage function, defined by choosing a random 
T € {1,2}, setting (sk,, pk) = (sk, pk), choosing a new fresh key pair (sk;, pk,), 
where i = 3 — 7, and outputting L(pk,, pks, ski, sk2). In addition, it will use 
one bit of advice which is whether sk; € S. If so, the leakage function L’(pk, sk) 
outputs sk; in addition to L(pk,, pkg, ski, ska), and otherwise it outputs only 
L(pk,, pkg, sk, sk2). Now we can prove that indeed, for many pairs (pk, sk), 
the leakage L'(pk, sk) leaks at most A bits about sk (and B breaks £? on the 
corresponding keys). 

Note that the leakage function L’ sometimes leaks more than it should. 
Namely, in some cases the value y + L’(pk,sk) reduces the min-entropy of 
sk by A; but in other cases it reduces the min-entropy of sk by more than NE 
and in these cases it is an invalid leakage function. For this reason, we need to 
consider the point-wise \-leakage definition. In addition, note that L’ used only 
one bit of additional advice. Therefore when going from € to €' the reduction 
uses logt bits of advice. 


Point-Wise Almost -Leakage. In this model, the idea of the reduction is the 
following: The adversary A will leak L’(sk,pk), which is a randomized leakage 
function, defined by choosing a random 7 € {1,2}, setting (sk,,pk_) = (sk, pk), 
choosing a new fresh key pair (sk;,pk;), where i = 3 — 7, and outputting 
L (pk, pkg, sky, sk2), and in addition with probability 1/2 outputting ski. 

As in the model with advice, the leakage function L’ might leak more than 
A bits about sk, and thus we use the point-wise definition. In the analysis, we 
distinguish between the case that the set S is noticeable and the case that it 
is negligible. In the former, with non-negligible probability the leakage function 
L’ will sample sk; € S and will output it. In this case the leakage function is 
legal. If the set S is negligible, we claim the distribution of the secret key sk, 
is statistically close to the distribution of sk, conditioned on the event that 
sk; ¢ S (as this event happens only with negligible probability). Therefore, if 
L’ did not output the secret key sk;, the secret key sk, is statistically close to 
a distribution with high enough min-entropy. Due to this analysis, we need to 
relax our leakage model almost A-leakage resilient. 

Since the analysis in this model is asymptotic, we are not able to extend it 
beyond a constant number of repetitions. See discussion in Section [5] 


1.4 Paper Organization 


We define our generalized notion of public-key primitives in Section] where we 
also define parallel repetition and leakage attacks on such primitives. Our model 


3 This happens when the set S is very small, yet sk; € S. 
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of point-wise leakage resilience is presented in Section B] In Section [4] we explain 
why all known leakage resilient schemes are also point-wise leakage-resilient. 

Our parallel repetition theorems for a constant number of repetitions and for a 
polynomial number of repetitions are presented in Sections] and [6] respectively. 
In Section [7] we discuss what our theorems imply for schemes that are only 
computationally indistinguishable from being secure in our model. Appendix [A] 
contains the min-entropy splitting lemma that is used for all our proofs. 

Due to space limitations, some proofs are omitted from this extended abstract 
and can be found in the full version [6]. 


2 Public-Key Primitives, Parallel-Repetition, Leakage 
Attacks 


In this section we give a definition of a public key primitive which generalizes one- 
way relations and public-key encryption under chosen plaintext attack (CPA). 
We then show how to define parallel repetition with respect to public-key primi- 
tives in a way that, again, generalizes the intuitive notions of parallel repetition 
for either one-way relations or public-key encryption. 


2.1 A Unified Framework for Public-Key Primitives 


We use the following formalization that generalizes both one-way relations and 
public-key encryption. 


Definition 2.1 (public-key primitive). A public-key primitive E = (G, V) is 
a pair of PPT algorithms such that 


— The key generator G generates a pair of secret and public keys: (sk, pk)-G(1*). 
— The verifier V is an oracle machine such that VO) (pk) either accepts or 
rejects. 


Definition 2.2 (secure public-key primitive). A public-key primitive E = 
(G,V) is secure if for any PPT oracle break, it holds that 


P yrreak(pk) (nk)] = negl (k) . 
eee Cae (pk)| = negl(k) 


To be concrete, for one-way relations, the breaker needs to send a candidate 
secret key sk (= inversion of the public key), and the verifier runs the relation’s 
verification procedure. To see why public key encryption can be stated in these 
terms, requires some work. The reason it is not immediate is that typically, we 
would consider the interaction between the verifier and the breaker, to be the 
following: The verifier gives the breaker a challenge ciphertext Enc,,(b), and he 
accepts if the breaker succeeds in guessing b. However, the breaker can clearly 
cause the verifier to accept with probability 1/2, where we need to ensure that 
the breaker succeeds only with negligible probability. This technical annoyance 
can be fixed by considering the game where the verifier sends poly(k) challenge 
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ciphertexts to the breaker, each encrypting a random bit. The breaker succeeds 
if it succeeded in guessing significantly more than 1/2 of the bits encrypted. The 
formal definition and precise analysis are much more cumbersome. The proof 
appears in the full version [6]. 

Note that our verifier (which corresponds to the challenger in “security game” 
based definitions) only gets the public key as input and not the secret key. If 
the secret key was also given, then all public-key encryption schemes, signature 
schemes, and one-way relations, would trivially fit into this framework. However, 
in this work, we only consider primitives where the verifier V does not use the 
secret key sk to verify, but uses only the public key pk. An example of such a 
primitive is public-key encryption (under CPA). However, signature schemes or 
CCA secure encryption schemes do not fall into this category, since for these 
primitives the verifier in the definition above does need to know the secret key 
sk in order to simulate the signing oracle, in the case of signature schemes, and 
to simulate the decryption oracle, in the case of CCA encryption schemes. 


2.2 Parallel Repetition 


Definition 2.3 (t-parallel repetition). For any public-key primitive E = 
(G,V) and any t € N, its t-parallel repetition, denoted Et = (Gt, V+), is in itself 
a public-key primitive defined as follows 


— The key generator (sk*, pk')<-G*(1*) generates (sk;,pk;)<-G(1") for all i € 
[t] and outputs sk’ £ (ski,...,skz), pk’ £ (pk,,...,pk;,). 

— The verifier (Vt) Oe ) (pk), runs VO" (pk,) for all i € |t], and accepts 
if and only if they all accept. 


A direct product of t schemes €; x --- x E is defined similarly. 

While it is straightforward that our definition captures the notion of parallel 
repetition for one-way relations (where the goal is to find legal pre-images for all 
input public-keys), let us be a little more explicit about how the above captures 
parallel repetition for public-key encryption. 


Lemma 2.4. Let E = (G,V) be a public-key primitive that represents a public- 
key encryption scheme and let t € N. Then there exists a public key encryption 
scheme that is represented by E*. 

Moreover, this scheme is obtained by secret sharing the message into t shares 
and encrypting share i with pk;. To decrypt, decrypt all shares and restore the 
message. 


The proof is straightforward and is omitted. 


2.3 Leakage Attacks 


In this section, we generalize the notion of leakage attacks to our public-key 
primitive framework. Note that we do not define what it means for a scheme to 
be secure, only present a model for an attack. 
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Definition 2.5 (leakage attack). We consider adversaries of the form A = 
(leaky, breaka), where leaky, break are (possibly randomized) functions. We 
refer to leak, as the leakage function and to break, as the breaker. 

A leakage attack of an adversary A = (leaky, break) on a public-key primitive 
E = (G, V) (with security parameter k) is the following process. 


— Initialize: Generate a key pair (sk, pk) È G(1*). 

— Leak: Apply the leakage function on the key pair to obtain the leakage value 
y<leak4 (pk, sk). 

— Break: A succeeds if V>tk(?*Y) (pk) accepts. 


3  Point-Wise Leakage Resilience 


In this work, we consider “noisy leakage” functions, which are only allowed to 
reduce the (average) min-entropy of the secret key by a bounded amount. How- 
ever, we relax the min-entropy restriction, and consider a point-wise definition, 
where we require that the specific leakage value is legal (as opposed to requiring 
that the leakage function is always legal). 

We define our new model below. Then, in Sections we present two 
relaxed versions of point-wise leakage resilience that we need in order to prove 
our parallel repetition theorems. Finally, in Section [B.3] we show that all of these 
notions are strictly stronger than the old bounded-leakage model of [I]. Namely, 
security w.r.t. to our definitions imply, as a special case, security w.r.t. bounded 
leakage. 


Definition 3.1 (point-wise \-leakage). Let E = (G,V) be a public key primi- 
tive. A possibly randomized leakage function leak is \-leaky at point (pk, y), where 
pk is a public key and y is a leakage value (in the image of leak), if 


Ho (Spx,y) 2 Hao (Spx) =À, 


where Spk is the distribution of secret keys conditioned on the public key being 
pk, and Spry is the distribution of secret keys conditioned on both the public key 
being pk and on leak(pk, sk) = y. 


Definition 3.2 (point-wise -leakage resilience). A public-key primitive 
E = (G,V) is point-wise \-leakage-resilient if for any PPT adversary A, where 
A = (leak4, breaka), it holds that 


Advg,,[A] £ Pr [(leak4 is A-leaky at (pk, y)) A (A(pk, y) succeeds)| = negl (k) , 


where the probability is taken over (sk, pk) — G(1*), over the random coin tosses 
of A = (leaky, breaka), and over the random coin tosses of the verifier in the 
verification game. 


In order to obtain our direct product theorems for leakage resilience, we relax 
the point-wise leakage resilience definition in two (incomparable) ways. 
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3.1 First Relaxation: Almost Leakage Resilience 


In this relaxation, instead of requiring that sk has high min-entropy conditioned 
on pk and y = leak(pk, sk), we require that the distribution of sk (conditioned 
on pk, y) is statistically close to one that has high min-entropy. 


Definition 3.3 (close to \-leaky). A leakage function leak is -close to À- 
leaky at point (pk,y) if there exists a distribution Spx, that is -close to Spk,y 
and 


His ( Soka) 2 Ho (Spx) a 


Definition 3.4 (resilience to almost -leakage). E€ = (G,V) is point-wise 
almost A-leakage-resilient if for any PPT adversary A = (leak,4, break) and for 
any negligible function u, it holds that 


Adve a,u [A] = Pr [(leak4 is -close to \-leaky at (pk, y)) A (A(pk, y) succeeds)| 
=negl(k) . 


where the probability is taken over (sk, pk) + G(1*), over the random coin tosses 
of A = (leaky, breaka), and over the random coin tosses of the verifier in the 
verification game. 


Under this definition we obtain a direct-product theorem for constant number 
of repetitions. 


3.2 Second Relaxation: Leakage Resilience with Advice 


To obtain a direct-product theorem for a super-constant number of repetitions, 
we use a slightly different (and incomparable) model, where we do not allow 
statistical closeness, but rather allow the attacker to get a logarithmic number 
of bits of (possibly inefficient) advice. 


Definition 3.5 (PPT.,). We say that a function f is PPT_, computable if the 
function f.a, defined below, is PPT computable. The function f.a is identical to 
f, except that the last a bits of its output are truncated. 

We say that an adversary A = (leak,4, break4) is a PPT. adversary if leak, 
is PPT.4 computable and break, is PPT computable. 


Definition 3.6 (point-wise \-leakage with advice). A public-key primitive 
E = (G,V) is resilient to point-wise \-leakage and logarithmic advice if for any 
PPT_O(logk) adversary A = (leaky, break) it holds that 


Adve,,[A] £ Pr [(leak4 is A-leaky at (pk,y)) A (A(pk, y) succeeds)| = negl(k) , 


where the probability is taken over (sk, pk) < G(1*), over the random coin tosses 
of A = (leaky, breaka), and over the random coin tosses of the verifier in the 
verification game. 
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3.3 Relation to Bounded Leakage 


To conclude, we prove that point-wise \-leakage resilience implies the basic form 
of \-bounded leakage. A proof sketch appears in the full version [6]. 


Definition 3.7 ({I]). A public-key primitive E = (G, V) is A-bounded leakage 
resilient if any PPT adversary A = (leaky, breaka) for which the output of leak, 
is at most À bits, succeeds with negligible probability. 


Lemma 3.8. If E = (G,V) is point-wise -leakage resilient then it is also à- 
bounded leakage resilient. 


We note that point-wise almost A-leakage resilience, and \-leakage resilience with 
logarithmic advice, are stronger notions of security (they give the adversary more 
power) and thus the above immediately applies to these notions as well. 


4 Why Known Schemes Are Point-Wise Leakage 
Resilient 


In this section, we show that leakage resilience is amplified by parallel repetition 
for, essentially, all known schemes that are resilient to non-trivial (i.e. super- 
logarithmic) leakage. To show this, we sketch a proof template that is shared 
among all (non trivial) leakage resilient results, and we show that this proof 
template proves security also w.r.t. our leakage models (the point-wise almost 
A-leakage model, and the point-wise A-leakage with logarithmic advice model). 


The Proof Template. The proof template for proving leakage resilience is very 
simple, and works in two hybrid steps. Recall that the adversary first gets a pair 
(pk, y = L(pk, sk)), where L is a poly-size leakage function chosen by A. Then it 
chooses messages mo, mı and gets a challenge ciphertext cp + Encpk (ma). The 
adversary wins if it guesses the bit b correctly. 

The first step in the template is to replace the challenge cy, with an “illegally” 
generated ciphertext cf, such that (sk, pk, cy) N (sk, pk, cf) (and it is efficient to 
generate c; given sk,pk,b). Due to computational indistinguishability, the ad- 
versary’s success probability should remain unchanged. We note that there is no 
entropy involved in this part, only a requirement that L is efficiently computable. 

The second step is completely information theoretic: It is proven that if the 
distribution of the secret key conditioned on pk, y, which we denote by Spx, has 
sufficient min-entropy, then cf carries no information on b (or, more precisely, 
that conditioned on the view of the adversary, b is statistically close to uniform). 
Therefore, no adversary can guess its value with non-negligible advantage. 


Point-Wise Leakage Resilience. The above proof template also proves point-wise 
leakage resilience. The second step of the hybrid works in a point-wise manner 
and therefore we only need to worry about the first step. In the first step, clearly 
computational indistinguishability still holds, but proving that the point-wise 
advantage remains unchanged is a bit harder, since we cannot efficiently check 
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the point-wise advantage. Nevertheless, we argue that if the advantage of A is 
non-negligible, then it drops by a factor of at most two. Such a claim is sufficient 
for the next level of the template. 

To see why this is the case, consider an adversary A that has non-negligible 
point-wise advantage € when given (pk,y,cy), but less than €/2 when given 
(pk, y, c3). Recall that the advantage measures the probability of both A suc- 
ceeding (in the verification game) and pk, y being point-wise A-leaky. It follows 
that with non-negligible probability over pk, y, the conditional success probabil- 
ity of A, conditioned on pk, y, drops by at least €/4 (otherwise the advantage, 
which measures over a subset of the pk, y, couldn’t have dropped). 

A distinguisher 6(sk, pk, c/c) is defined as follows: First, compute the leak- 
age y:=L (sk, pk). Then generate many samples of c,/c; and use them to evaluate 
the success probability of A conditioned on pk, y in the two cases. If indeed pk, y 
are such that the success probability drops, use A to distinguish between the 
two cases. If no noticeable change in the success probability was noticed, then 
output a random guess. Putting it all together, we get a polynomial distinguisher 
between (pk, y, cp) and (pk, y, cf), in contradiction to the hardness assumption. 

We note that this is true even if y is not fully known to the distinguisher: say 
O(log k) bits of y are not known, the distinguisher can still try all options and 
check if for either one the success probability changes by €/4. 


Our Relaxed Models of Point- Wise Leakage Resilience. Our first relaxation, of 
allowing the secret key to be statistically close to A-leakage resilient, only effects 
the second step of the template. We can still argue that b is statistically close 
to uniform by adding another hybrid where the conditional distribution Spk,y is 
replaced with a statistically indistinguishable Spk.y that has high min-entropy. 

Our second relaxation, of allowing logarithmic advice, goes into the first step 
(this is the only step where we care about the complexity of L). As we explained 
above, our argument works even if a logarithmic part of the leakage value is not 
known. Therefore we will use only the efficient part of the leakage function and 
computational indistinguishability will still hold. 


Computationally Indistinguishable Schemes. For some schemes, such as [I] [16], 
leakage resilient is proven by showing that they are computationally indistin- 
guishable from another scheme which, in turn, is proven leakage resilient using 
the template. We show in Section[/]that this still implies that parallel repetition 
amplifies leakage. 


5 Direct-Product Theorem for a Constant Number of 
Repetitions 


In this section, we prove a direct-product theorem for a constant number of 
repetitions, w.r.t. point-wise almost leakage-resilience as defined in Section 


Theorem 5.1. Let c € N be a constant, and for every i € |c], let Ei = (Gi, Vi) be 
a point-wise almost X;,-leakage-resilient public-key primitive. Then, E1 x... Ee 
is point-wise almost \-leakage-resilient, where X = X>; (Ai — 1). 
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It suffices to prove this theorem for c = 2, and apply it successively. In order to 
simplify notation, we prove it for the case of parallel repetition, where E1 = E2, 
the proof extends readily to the case of direct product. 


Lemma 5.2. Let E = (G,V) be a point-wise almost -leakage-resilient public- 
key primitive. Then E? is point-wise almost (2\ — 1)-leakage-resilient. 


Before we present the outline of the proof, let us make a few remarks. 


1. Note that there is a loss of one bit in the amplification. Namely, we go from A 
to (2A—1) instead of just 2. While some loss in the parameters is implied by 
our techniques, more detailed analysis can show that the composed scheme 
is in fact (2 — d)-leakage resilient for any ô(k) = 1/poly(k). Thus the loss 
incurred is less than a single bit. As our result is qualitative in nature, we 
chose not to overload with the additional complication. 

2. While at first glance one could imagine that Theorem [B.I] should extend 
beyond constant c, we were unable to prove such an argument. The reason is 
that super-constant repetition gives a different scheme for each value of the 
security parameter. This means that we cannot use Theorem [5.1] as black- 
box. More importantly, our proof techniques rely on the asymptotic behavior 
of the scheme so we were not able to even change the proof to apply for a 
super-constant number of repetitions. 

A result for the more general case of any polynomial number of repetitions 
is presented, in the slightly different and incomparable “advice” model, in 
Section [6] 

Finally, we remark that known negative results for security of parallel 
repetition are already effective for a constant number of repetitions. Thus 
our result contrasts them even for this case. 


Proof overview of Lemma [5.2} We consider an adversary B that succeeds in 
the parallel repetition game, and construct an adversary A that succeeds in the 
single instance game. The straightforward proof strategy would be to “plant” 
the “real” key pair, that is given as input to A, as one of the key pairs that 
are input to B, and sample the other pair uniformly In such case, the input 
to B is distributed identically to the parallel repetition case and indeed 6 will 
succeed with noticeable probability. However, we may no longer be able to claim 
that our leakage leaves sufficient entropy in the secret key. We are guaranteed 
by the functionality of B that the key pair (sk), sk2) is left with sufficient min- 
entropy but it is still possible that neither sk, nor skz have any min-entropy by 
themselves. 

To solve the above we use Lemma [A.I] which essentially says how to split-up 
the joint entropy of two random variables. Specifically it says that either sk, or 
sk2|sk, will have sufficient min-entropy, depending on whether sk; belongs to a 


4 We note that even this step is impossible when relying on “secretly generated” public 
parameters as in the scheme presented in [16] (or rather, the scheme that is compu- 
tationally indistinguishable to theirs and actually has entropic leakage resilient). 
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hard-to-recognize set R, and conditioned on the knowledge of whether sk, € R. 
Namely, either skı|Lskier or sk2ə|(skı, Lskier) have high min-entropy. If we 
could compute the bit 1.4,¢R, we would have been done (and indeed if we are 
allowed one bit of inefficient leakage, an easier proof follows, see Section|[6). Since 
this is impossible, we turn to case analysis: 

Obviously, if Pr[sk; € R] = negl(k), then we can always guess that 1.,,cr = 0 
and be right almost always. This implies that in such case sk2|sky is statistically 
indistinguishable from having high min-entropy, as we wanted. 

For the second case, if Pr[sk; € R] > 1/poly(k), then sk2|(sk1, lsx,er) will 
have high min-entropy for a noticeable part of the time. To complete the analysis 
here, we notice that 


Hx (sk2|(sk1, Lskier)) = Hoo (sk2|8k1). 


This is because R is a well defined set and thus 1.x,¢R is a deterministic (though 
hard to compute) function of skı. It follows that skə|skı will have high min- 
entropy for a noticeable fraction of the time, which completes the proof. 

For the formal proof, see the full version [6]. 


6 Direct-Product Theorem for Polynomially Many 
Repetitions 


In this section we present a direct product theorem that applies to any polyno- 
mial number of repetitions. This theorem is relative to the advice model defined 
in Section [3.2] For the sake of simplicity, we will assume that the number of 
repetitions is a power of 2, although the same techniques can be used for any 
number. 


Theorem 6.1. Let € = (G,V) be a public-key primitive that is resilient to point- 
wise A-leakage and logarithmic advice. Let t = t(k) be a polynomially bounded 
function of the security parameter such that t(k) is always a power of 2. Then 
Et is resilient to point-wise t(A — 1)-leakage and logarithmic adivce. 


Towards proving the theorem, we present the following lemma, which is a pa- 
rameterized special case of the above theorem, and will imply the theorem by 
successive applications. 


Lemma 6.2. For any public-key primitive E = (G, V) and any PPT_g adversary 
B = (leakg, break) for E°, there exists a PPT _(q41) adversary A = (leak 4, breaka) 
for E, such that for all k, 


Adve,,[A] > (1/4) - Adve (2,-1)[B] - 


The theorem immediately follows by applying the lemma logt times. See proofs 
in the full version [6]. 
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7 Leakage from Computationally Indistinguishable 
Schemes 


Our definition of point-wise leakage resilience is based on the residual min- 
entropy of the secret key, conditioned on the leakage value. In the literature, 
starting with [18], this is referred to as “resilience to noisy leakage”. It is self ev- 
ident that schemes where the public key is an injective function of the secret key 
cannot be proven leakage resilient in this respect. This is because even leaking 
the secret key in its entirety, which obviously breaks security, does not reduce its 
min entropy conditioned on the public key (the conditional min-entropy is 0 to 
begin with, and it stays 0 after the leakage). We do know, however, of such injec- 
tive public-key encryption schemes that are proven to be leakage resilient with 
respect to the weaker notion of “length bounded leakage”. There, the restriction 
on the leakage function is that it has bounded length. Notable examples are the 
scheme of [1| and the scheme of (which was introduced as a counterexam- 
ple for parallel repetition of length-bounded leakage resilience, see Section [L.2). 
While at first glance it may seem that our result is completely powerless with 
regards to such schemes, we show in this section that for all known schemes, and 
specifically for the schemes of [I] [16], our theorem in fact does imply parallel 
repetition. 

The key observation upon revisiting the proofs of security of [I] [16], is that 
in both cases, the proof is by presenting a second scheme in which the key 
distribution is computationally indistinguishable from the original scheme (but 
may have undesired features such as worse efficiency of key generation), and 
proving that this second scheme is resilient to leakage of bounded length. This 
implies that the original scheme is resilient to bounded leakage as well (since 
otherwise one can distinguish the key generation processes). The second scheme, 
in these two cases, is in fact resilient to noisy leakage. Furthermore, the second 
scheme in the two cases adheres to our notion of point-wise leakage resilience. 

In light of the above, we put forth the following corollary of Theorems B.I] 
and [6.1] 


Corollary 7.1. Let E = (G,V) be a public-key primitive, and let G’ be such 
that GOF) & G’(1*). Then: 


1. If €' =(G',V) is point-wise almost \-leakage resilient, then E* is t- (A—1)- 
bounded leakage resilient for any constant t € N. 

2. If E = (G', V) is point-wise r-leakage resilient with logarithmic advice, then 
Et is t- (A — 1)-bounded leakage resilient for any polynomial t = t(k). 


Proof. The proof of the two parts is almost identical: We use either Theorem 
or Theorem[6.I]to show that (€’)' is point-wise almost t-(A—1)-leakage resilient, 
or, respectively, leakage resilient with logarithmic advice. By Lemma [3.8] this 
means that (€’)' is t- (A — 1)-bounded leakage resilient. 
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By a hybrid argument (G’)'(1") © G*(1*)] Therefore, it must be that £t 
is also t - (A — 1)-bounded leakage resilient (otherwise there is a distinguisher 
between the key generators). This completes the proof. 


Using Corollary [L.I] we can show that t-parallel repetition of the schemes of [I] 
[16] indeed amplifies their leakage resilience. 
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A How to Split Min-Entropy 


We present a lemma that shows that the joint min-entropy of two random vari- 
ables can be split between them under some condition. Variants of this lemma 
appeared in previous works (e.g. [8] [20]), this formulation is from [7]. 


Lemma A.1 (min-entropy split). Let X,Y be such that H(X,Y) > a+b, 
for a,b > 0. Then there exists a set Rx, which is a subset of the support of X 
such that both: 


1. For all x € Rx, it holds that H% (Y |X = x) > b. 
2. Ho(X|X ¢ Rx) > a—log(1/e), where e = Pr[X ¢ Rx]. 


Proof. Define 
he ê {x : Pr[X = 1r] > 27°}. 


Then for all x € Rx and for all y, it holds that Pr[Y = y|X = z] < 27°, and 
thus Hx. (Y|X = x) > b. In addition, Hœ (X|X ¢ Rx) > a + logPr|X ¢ Rx], 
ie. Ho (X|X ¢ Rx) > a — log(1/e). 


Leakage-Tolerant Interactive Protocols* 


Nir Bitansky!?, Ran Canetti!-?, and Shai Halevi? 


1 Tel Aviv University 
2 Boston University 
3 IBM T.J. Watson Research Center 


Abstract. We put forth a framework for expressing security require- 
ments from interactive protocols in the presence of arbitrary leakage. 
The framework allows capturing different levels of leakage-tolerance of 
protocols, namely the preservation (or degradation) of security, under 
coordinated attacks that include various forms of leakage from the secret 
states of participating components. The framework extends the univer- 
sally composable (UC) security framework. We also prove a variant of 
the UC theorem that enables modular design and analysis of protocols 
even in face of general, non-modular leakage. 

We then construct leakage-tolerant protocols for basic tasks, such 
as secure message transmission, message authentication, commitment, 
oblivious transfer and zero-knowledge. A central component in several of 
our constructions is the observation that resilience to adaptive party cor- 
ruptions (in some strong sense) implies leakage-tolerance in an essentially 
optimal way. 


1 Introduction 


Traditionally, cryptographic protocols are studied in a model where participants 
have a secret state that is assumed to be completely inaccessible by the adversary. 
In this model, the adversary can only influence the system via anticipated inter- 
faces (such as, the communication among parties). These interfaces are crossed 
only when the adversary manages to fully corrupt a party, thus gaining access 
to its entire inner state. 

In reality, an intermediate setting often emerges, when the adversary manages 
to gain some partial information on the secret state of uncorrupted parties. This 
information, termed leakage, can be obtained by a variety of side channels attacks 
that bypass the usual interfaces and are often undetectable. Known examples 
include: timing, power, EM-emission, and cache attacks (see for a survey). 

The threat of leakage gained much attention in the past few years, giving 
rise to an impressive array of leakage-resilient schemes for basic cryptographic 
tasks such as encryption and signatures, as well as general non-interactive cir- 


cuits (e.g., [DP08]|AGV09} [AD W09] [DKL09} [Pie09] [NS09] |ADN* 10) [BKK V10} 
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[BSW11]). Most of the work concentrates on preserv- 
ing, in the presence of leakage, the same functionality and security guarantees 
that the original primitives guarantee in a leak-free setting. Such strong leakage- 
resilience is typically guaranteed only when the leakage is restricted in some 
ways. Examples include: assuming bounded amounts of leakage, assuming that 
leakage only occurs in specific times (e.g., prior to encryption), or assuming that 
leakage is limited to specific parts of the state, such as the active parts in the 
only computation leaks model [MRO4]. 

However, in many cases maintaining the same level of security as in a leak- 
free setting may be too costly, or even outright impossible. To exemplify this, 
consider the task of secure message transmission (SMT), where a sender wishes 
to transmit a (secret) message m to a receiver, so that the contents of m remain 
completely hidden from any adversary witnessing the communication. In the 
leak-free setting, the problem is easily solved using standard semantically secure 
encryption; however, in the presence of leakage, this is no longer the case. In 
fact, semantic security is not achievable at all: an adversary that can get even 
one bit of arbitrary leakage, from either party, can certainly learn any bit of the 
message, since this bit must reside in the party’s leaky memory at some point. 

Nevertheless, this inherent difficulty does not imply that we should give up on 
security altogether, but rather that we should somehow meaningfully relax the 
security requirements from protocols in the presence of leakage. Concretely, in 
the above example, we would like to design schemes in which one-bit of leakage 
on the message does not compromise the security of the entire message. More 
generally, we would like to establish a framework that will allow to express and 
analyze security of general cryptographic tasks in the presence of general (non- 
restricted) leakage, where the level of security may gracefully degrade according 
to the amount of leakage (that might develop over time). A first step in this 
direction was taken by Halevi and Lin in the context of encryption. 

Another intriguing question is what are the composability properties of re- 
silience to leakage. Can one combine two or more schemes and deduce leakage- 
resilience of the combined system based only on the leakage-resilience proper- 
ties of the individual schemes? If so, constructs with various levels of leakage- 
resilience may be composed to obtain new systems that enjoy improved such 
resilience properties. Some specific examples where this is the case have been 


recently exhibited [BCG*11} [BGK11] [GJS11). What can we say in general? 


1.1 Our Contribution 


We propose a new approach for defining leakage-resilience, or rather leakage- 
tolerance, properties of cryptographic protocols. The approach is based on the 
ideal model paradigm and, specifically, on the UC framework. The approach al- 
lows formulating relaxed security properties of protocols in face of leakage and, 
in particular, allows specifying how the security of protocols degrades with leak- 
age. It also allows specifying leakage-tolerant variants of interactive, multi-party 
protocols for general cryptographic tasks. In this context, the new modeling 
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also captures attacks that combine leakage with other “network based” attacks 
such as controlling the communication and corrupting parties. In addition: 


— We prove a general security-preserving composition theorem with respect to 
the proposed notion. This allows constructing and analyzing protocols in a 
modular way while preserving leakage-tolerance properties. This is a powerful 
tool, given the inherently modularity-breaking nature of leakage attacks. 

— We describe a methodology for constructing leakage-tolerant protocols in 
this framework. Essentially, we show that any protocol that is secure against 
adaptive party corruptions (in some strong sense) is already leakage-tolerant. 

— Using the above methodology and other techniques, we construct composable 
leakage-tolerant protocols for secure channels, commitment, zero-knowledge, 
and honest-but-curious oblivious transfer. (commitment and zero-knowledge 
are realized in the common reference string model.) 


Below we describe these contributions in more detail. 


Leakage-Tolerant Security within the Ideal Model Paradigm. Following the ideal 
model paradigm, we define security by requiring that the protocol m at hand 
provides the same security properties as in an “ideal world” where processing is 
done by a trusted party running some functionality F. Specifically, in the UC 
framework, a protocol m UC-realizes a functionality F if for any adversary A 
there exists a simulator S such that no environment Z can tell whether it is 
interacting with A and 7 or with S and F. 

We consider a “real world” where the adversary can get leakage on the state 
of any party at any time. As we argued above, such attacks may unavoidably 
degrade the security properties of the protocols at hand and to account for this 
degradation we also allow leakage from the trusted party in the ideal world. 
Specifically, the functionality F defines the “ideal local state” for each party 
and the party’s behavior (and degradation in security) after leakage. (Typically, 
we will be interested in functionalities where the ideal local state includes the 
party’s inputs and outputs, but weaker functionalities that allow joint leakage 
on the inputs of several parties can also be considered.) When A performs a 
leakage measurement L on the state of some party in the real protocol m, the 
simulator S is entitled to a leakage measurement L’ on the ideal local state of 
that party in the ideal protocol. We allow the simulator to choose any function 
L’, so long that its output length is the same as that of L. 

For example, we allow our leaky SMT functionality to leak bits from message 
that it sends and require that a real world attacker that gets £ bits of leakage 
from the state of the implementation can be simulated by a simulator that learns 
only £ bits about the message. Our model also allows the functionality to react 
to leakage, in order to handle situations where security is only maintained as 
long as not too much leakage occurred. (For example, an authenticated channels 
functionality may allow forgeries once the attacker gets more bits of leakage than 
the security parameter, but not before that.) 


Leakage vs. Adaptive Corruptions for Secure Channels. Consider trying to realize 
leaky SMT in our model using standard encryption; namely, the receiver sends its 
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public key to the sender, who sends back an encryption of m. In the ideal world, 
the simulator does not witness any communication (and has no information 
about the message), so it can simulate the cipher by encrypting say the all-zero 
string, which should be indistinguishable from an encryption of m. However, after 
seeing the ciphertext the adversary A can ask for a leakage query specifying (say) 
the entire secret decryption key and the first bit of m. Although the simulator 
can now ask for many bits of leakage on m, it can no longer modify the ciphertext 
that it sent before and therefore cannot maintain a consistent simulation. 

A similar problem arises in the well studied setting of adaptive corruption 
(with non-erasing parties), where the adversary can adaptively corrupt parties 
throughout the protocol and learn their entire state. Also there, the simulator 
needs to first generate some messages (e.g., the ciphertext) without knowing the 
inputs of the parties (e.g., the message m), and later it learns the inputs and has 
to come up with an internal state that explains the previously-generated mes- 
sages in terms of these inputs. Indeed, it turns out that techniques for handling 
adaptive corruptions can be used to get leakage-tolerance. 

In fact, the problem of secure leaky channels can be solved simply by plugging 
in non-committing encryption (NCE) , which was developed for adap- 
tively secure communication. Recall that an NCE scheme allows generating a 
“fake” equivocal ciphertext č that can later be “opened” as an encryption of any 
string of a predefined length £. Namely, č is generated together with a poly-size 
equivocation circuit Æ, such that, given any message m € {0,1}, E(m) gener- 
ates randomness (7¢', 7’), for both the sender and the receiver, that “explains” 
č as an encryption of m. 

To obtain leakage-tolerant secure message transmission, we can simply encrypt 
the message using an NCE scheme. The simulator can now generate the fake ci- 
phertext č with the associated equivocation circuit EF and can then translate any 
c-dependent leakage function on the entire state (plaintext and randomness) into 
a leakage function on the plaintext only, which can be queried to the leaky SMT 
functionality. When leakage on P € {.5,R} occurs, the simulator S translates 
the leakage function L(m,rp) into L’(m) = L(m, E(m)) = L(m, 7%). Indeed, 
this idea was used in in the context of a specific protocol. 


The General Case. The above example can be made general. Specifically, we 
show that, with some limitations, any protocol that realizes a functionality F 
under adaptive corruptions also realizes a leaky variant F+'* under leakage. The 
“leaky variant” is a natural adaptation that allows leakage on the state of F, 
just like the leaky SMT allow leakage on the transmitted message. This variant, 
denoted Ft", is identical to F except that F*' allows the simulator to apply 
arbitrary leakage functions to the ideal local state (which is the same as the state 
defined in a semi-honest corruption). When such leakage occurs the environment 
is reported on the identity of the leaking party and the number of bits leaked. 
(This makes sure that the simulator can only leak the same number of bits as 
in the protocol execution.) After such a leakage event, F*'* behaves in the same 
way that F behaves after a semi-honest corruption of that party. That is, if 
F modifies its overall behavior following the corruption of a party, then Ft'* 
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modifies its behavior in the same way. (In the applications considered in this 
work, we will consider functionalities that do not change their behavior after 
semi-honest corruptions, see Section [5}) 

A limitation of this result is that it only holds when the given proof of security 
uses a restricted type of simulators, namely ones that work “obliviously” of 
the state that they learn when corrupting a player. We call such simulators 
corruption-oblivious. We have: 


Theorem 1.1 (informal). If protocol n realizes F under adaptive corruptions 
(either semi-honest or Byzantine) with a corruption-oblivious simulator, then tt 
also realizes F+™ under arbitrary leakage (and the same type of corruptions). 


Composable Leakage- Tolerance. An important property of ideal model based 
notions of security is that they enable modularity, since the guarantees that they 
provide are preserved even under (universal) composition of protocols. That is, if 
a protocol 7 realizes an ideal functionality F, the security properties of F carry 
over to any environment where 7 is used. 

To achieve such modularity, common models of composable security rely cru- 
cially on viewing different sub-modules of a large system as autonomous small 
systems, each with its own local state and well-defined interfaces to the rest of 
the system. Unfortunately, extending this “modular security” paradigm to the 
leaky world is problematic: real world leakage is inherently non-modular, in that 
the adversary can obtain leakage from the joint state of an entire physical device 
and is not bound by our modular separation to logical modules of the software 
running on the device. In fact, it is not even clear how to express joint leakage 
from the state of different modules within standard models, let alone how to 
argue about preservation of security properties. 

We extend the UC security framework to allow expressing leakage 
attacks from physical devices that span multiple logical modules. We first allow 
the protocol analyzer to delineate sets of “jointly leakable modules” (roughly 
corresponding to physical machines). Then, we introduce a new entity, called an 
aggregator, that has access to the internal states of all the modules in each set. 

To get leakage from the joint state of the modules in a set P, the adversary 
sends the leakage function L to the aggregator, who applies L to the combined 
state and returns the result to the adversary. The same mechanism is used to 
obtain leakage from ideal functionalities, except that here the ideal functionality 
F hands the aggregator some “ideal local state” that F associated with the set 
P. We stress again that our model considers a strong adversary that obtains 
leakage information in a non-modular way from multiple subroutines that reside 
on a common device, this makes positive results in this model quite strong. 

Having extended the model of protocol execution to capture leakage attacks, 
we would like to re-assert the composability property described above, i.e., to 
re-prove the UC composition theorem from in our setting. However, that 
theorem was only proved for systems that behave in a “modular way”, and the 
proof no longer holds in the presence of our modularity-breaking aggregator. 

Still, we manage to salvage much of the spirit of the UC theorem, as follows. 
We formulate a more stringent variant of UC security by putting some technical 
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restrictions on the simulator and then re-assert the UC theorem with respect to 
this varaint. Similarly to the case of corruption-oblivious simulators, here too we 
require that the simulator S handles leakage queries “obliviously” . 

Roughly, S has a “query-independent” way of translating, via a 
state-translation function, real world leakage queries L(state,) to ideal world 
leakage queries L’(statez). Furthermore, it ignores the leakage-results in the 
rest of the simulation. We call such simulators leakage-oblivious and show: 


Theorem 1.2 (UC-composition with leakage, informal). Let p? be a pro- 
tocol that invokes F as a sub-routine. Let m be a protocol that UC-emulates F 
with a leakage-oblivious simulator. Then the composed protocol p™/* (where each 
call to F is replaced with a call for n) UC-emulates p? in face of leakage. Fur- 
thermore, it does so with a leakage-oblivious simulator. 


Theorem[L.2] provides a powerful tool in the design of leakage-resilient protocols. 
In particular, we later use it to (a) combine any leakage-resilient protocol that 
assumes authenticated communication with a leakage-resilient authentication 
protocol into a leakage-resilient protocol over unauthenticated channels, and 
(b) to combine any leakage-resilient zero-knowledge protocol that assumes ideal 
commitment with leakage-resilient commitment protocols to obtain a composite 
leakage-resilient zero-knowledge protocol. 


Leakage-Tolerant Protocols. We construct leakage-tolerant protocols for a num- 
ber of basic cryptographic tasks. We first observe that the general result regard- 
ing the leakage-tolerance of adaptively secure protocols (Theorem [I.1) in fact 
guarantees UC security with leakage-oblivious simulators. We then observe that 
existing adaptively secure protocols for secure channels, UC commitment and 
UC semi-honest oblivious transfer already have corruption-oblivious simulators; 
hence, we immediately get: 


— Assume authenticated communication. Then, any non-committing encryp- 
tion scheme UC-realizes Faus in the presence of arbitrary leakage using a 
leakage-oblivious simulator. 

— In the CRS model, the UC commitment protocols of Canetti and Fischlin 
and Canetti, Lindell, Ostrovsky and Sahai [CLOS02], UC-realize 
Fico (the leaky version of the multi-instance commitment functionality) 
in the presence of arbitrary leakage. Furthermore, they do so with leakage- 
oblivious simulators. 

— Also in the CRS model, the UC (non-interactive) zero-knowledge protocol 
of Groth, Ostrovsky and Sahai[GOS06] realize F4“ under arbitrary leakage. 

— The semi-honest oblivious transfer protocol of for adaptive cor- 
ruptions UC-realizes Foe (the leaky version of the ideal oblivious transfer 
functionality in the presence of arbitrary leakage). Furthermore, it does so 
with leakage-oblivious simulators. 


In this work, we do not consider the generation of a CRS in the presence of 
leakage; rather, we treat the CRS as an external entity that can be generated in 
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a physically separate location. As in other settings, here too it is interesting to 
find ways to reduce the setup requirements. 

Finally, we note that for certain functionalities F, applying Theorem[_JJalone 
may still not give an adequate level of leakage-resilience. Indeed, while the leaky 
adaptation F+'* assures graceful degradation of privacy, it may not account for 
correctness (or soundness) aspects in the face of leakage. In such cases, we may 
need to further strengthen F+'*. One such example is message authentication. 
Indeed, FiS 4 gives essentially no security guarantees: as soon as even a single 
bit of information is leaked from the sender, Fie behaves as if the sender 
is fully corrupted, in which case forgery of messages is allowed. We thus first 
formulate a variant of FautH that guarantees authenticity as long as the number 
of bits leaked is less than some threshold B. We then realize this functionality, 
denoted Faris assuming an initial k-bit shared secret key between the parties 
and as long as at most B = O(k) bits leak between each two consecutive message 
transmissions. Furthermore, we do this with a leakage-oblivious simulator. The 
techniques used to realize F Aer include information-theoretic leakage-resilient 
message authentication codes, as well as NCE schemes. 

We note that the techniques here borrow strongly from the techniques used in 
for the related goal of authentication within the context of obfuscation 
with leaky hardware. That work, however, analyzed these tools in an ad-hoc 
manner, and the results there apply only to that specific context. 

In contrast, using the above UC theorem with leakage, we can combine the 
above authentication protocol with any protocol that assumes ideally authenti- 
cated communication to obtain composite leakage-tolerant protocols that with- 
stand unauthenticated communication. 

Finally, we address the task of obtaining zero-knowledge from ideal leaky 
commitment Fiom (the adaptive NIZK protocol of is obtained from 
specific number-theoretic assumptions on bilinear groups). At first it may seem 
that, as in the case of commitment, existing protocols for UC-realizing the ideal 
zero-knowledge functionality, Fzk, would work also in the case of leakage. How- 
ever, this turns out not to be the case. In particular, while the protocol of 
for UC-realizing Fz«:r, for some relation R, given Fmcom is indeed secure against 
adaptive corruptions, the simulator turns out not to be corruption-oblivious and 
Theorem [LJ] does not apply. 

Instead, we settle for UC-realizing, in the presence of leakage, a weaker variant 
of sae This weaker variant permits violation of the soundness requirements 
if too many bits were leaked from the verifier. We denote this weaker version 
by Wane r» Where B is the leakage threshold for the verifier. We show how to 
UC-realize ae. for B = k — w(logk) (where k is the security parameter), 
given access to Fyy¢oy. Using the (leaky) universal composition theorem and 
the protocol for realizing Fmcom (mentioned above), we obtain a protocol for 
UC-realizing F7,2, in the CRS model. 


Concurrent Work. Garg, Jain, and Sahai also investigate zero-knowledge 
in the presence of leakage, albeit not in the UC setting. Instead, they consider a 
stand-alone definition with a rewinding simulator (where a CRS is not needed). 
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Some of the difficulties that emerge in standard 3-round zero-knowledge pro- 
tocols, as well as the suggestion to overcome them using the Goldriech-Kahn 
paradigm, were communicated to us by Amit Sahai. 

Damgard, Hazay, and Arpita consider leakage-resilient two-party 
protocols. Their definition of security, which is also ideal-model based, accounts 
also for noisy leakage (namely leakage that might not be length-restricted, but is 
somewhat entropy preserving). They achieve leakage-resilience (or tolerance) for 
NC, functions in a setting where one party is statically and passively corrupted 
and the other party is leaky. The result, however, only applies in the ”only 
computation leaks” (OCL) model of (and with some extra technical 
limitations). They also prove a security preserving composition theorem, but 
their modeling considers only separate leakage from each module (rather than 
overall leakage as considered here). They also construct a leakage-tolerant OT 
protocol for sufficiently entropic inputs distributions, but only in the OCL model 
and under a relatively strong hardness assumption; in terms of communication, 
however, their protocol is more efficient than ours. Finally, we remark that the 
setting where one party is statically passively corrupted can be seen as a special 
case of a weak leakage-tolerance model, where the ideal world simulator is allowed 
to jointly leak from all the parties. See further discussion in Section 


2 Modeling Leakage in the UC Framework 


This section defines the new model of UC security with leakage. Here we provide 
a high-level overview, the full details can be found in the full version of this 
work {[BCH1]]. Recall that the basic UC framework considers realization of an 
“ideal specification” F by a “real implementation” m. (Formally both F and 
m are just protocols, we call them by different names to guide the intuition.) 
The realization requirement is that for any “real world attacker” A against the 
implementation 7 there exists another adversary S (called a simulator) against 
the specification F, such that an “environment” Z that interacts with S, F has 
essentially the same view as in an interaction with A, m. 

The basic UC execution model lets the environment Z determine the inputs to 
the parties running the protocol and see the outputs generated by these parties 
and also allows free communication between the environment and the adversary. 
The adversary, typically, has full control over the communication between parties 
and the ability to “corrupt” parties in various ways. Corruption is modeled as 
just another interface available to the adversary, where it can send a message 
“you are corrupted” to any party. (In the case of standard passive corruption, 
the party responds to this message by handing its entire internal state to the 
adversary. To model Byzantine corruption, the party also changes the program 
that it is running from then on.) 

A crucial aspect of the UC framework is its modularity, where programs can 
call subroutines, and these subroutines are treated as separate entities that can 
be analyzed separately for security properties. Importantly, local randomness 
and secrets that are used by a subroutine should typically not be visible to the 
calling routine or to other components in the system. 
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A useful technicality in the UC framework, is that it is sufficient to prove 
security only with respect to the dummy real world adversary D. This is the 
adversary that simply reports all the information it receives to the environment 
and follows all the instructions of the environment regarding sending messages 
to parties and ideal functionalities. Relying on the fact that any adversary can 
be emulated by the environment itself, it is easy to show that simulation of the 
dummy adversary D implies simulation for any adversary. 


Leaky UC. A natural approach to modeling leakage within the UC framework 
is to view it as a weak form of corruption, where the adversary gets some in- 
formation about the internal state of the leaky party but perhaps not all of it. 
Also, leakage resembles “semi-honest” corruption more than “malicious”, in that 
leaky parties keep following the same protocol and do not change their behavior 
following a leakage event. Thus we could provide yet another interface to the 
adversary where it can send a “leak L” message to a party (where L is some 
function) and have that party reply with L(s) where s is its internal state. 


The Leakage Aggregator. A serious shortcoming of the modeling approach 
in the previous paragraphs is that it only lets the adversary obtain leakage on 
individual processes (or subroutines). In contrast, real life leakage usually pro- 
vides information that depends on the entire state of a physical device, including 
all the processes that are currently running on it. To account for this inherently 
non-modular property of real life leakage, we introduce to the model a new 
“global entity” that we call the leakage aggregator. The aggregator G can access 
the entire internal state of all the components in the system. A leakage query 
specifies a leakage function L and a set of processes P = {p1,. . . , pt}. This query 
is forwarded to the aggregator, who evaluates L(s1,..., s+) and returns the result 
to the adversary. Some important technicalities regarding the working of G are 
the following: 


— A convention should be set for how to specify the sets of processes and ensure 
that this is a “legitimate set” for joint leakage. We assume that processes 
are tagged with “party identifiers” pid (roughly corresponding to physical 
machines), and joint leakage is allowed only from a set of processes that all 
have the same pid. 

— As done for corruptions, here too the identity of the leaky processes and the 
amount of leakage needs to be reported to the environment. This forces the 
simulator, in the ideal world, to use the same amount of leakage from the 
same processes as in the real world. 

— Since ideal functionalities represent idealized constructs that do not necessar- 
ily run on physical devices, they are often associated with more than one pid. 
Thus care should be taken when deciding how an ideal functionality reacts 
to leakage queries w.r.t. one of its pid’s. (For example, the secure-channels 
functionality runs on behalf of both the sender and the receiver, and would 
typically react differently to sender-leakage than to receiver-leakage queries. ) 
We let the ideal functionality itself decide how to reply when G asks it for 
the state corresponding to any of its pid’s. (This is the same convention as 
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used for corruption, where the functionality gets to decide what to reveal 
to the adversary when one of its pid’s is corrupted.) Typically, the “state” 
associated with a certain pid will be just the inputs that were received from 
that pid and the outputs it receives. 

— To allow functionalities to react to leakage situations, we have G, upon ac- 
cessing the state of a module, report to that module the output size of the 
leakage function L. Typically, “real world implementations” ignore this re- 
port (since we assume that real world leakage is undetectable), but “ideal 
functionalities” may use it to change their behavior (e.g., reduce the security 
guarantee if too much leakage occurred). 


With these conventions in place, a leakage operation is handled as follows: first, 
the adversary sends a query (leak, L, pid) to G, where L is the leakage function 
and pid is the leaking party ID. Then, G obtains state,ig, the total state of party 
pid, applies L to state,iqg, and returns the result to A. Finally, G reports the 
output length of the function L to all the processes whose state is included in 
state,iq and reports pid and the output length to the environment. 

We note that the security guarantee provided by this model may be weaker 
than one could desire, as the number of leaked bits is reported to each one of 
the processes (or functionalities). This means that when a domain leaks £ bits, 
each one of its components behaves as if the £ bits leaked entirely from this 
component. While this is a relatively weak leakage-resilience guarantee, it seems 
unavoidable in any general model with modularity-breaking leakage. 


Leakage-Oblivious Simulation. Following the approach of basic UC security, the 
definition of protocol emulation requires that for any adversary A that attacks 
the implementation m there exists a simulator S that attacks the specification 
F so that no environment can distinguish between an interaction with A and 
mt, and an interaction with S and F. In particular, S must provide an overall 
transformation from one interaction scenario to the other, including among oth- 
ers the leakage queries made by A to the parties (via the aggregator). As noted 
above, an equivalent requirement considers, instead of any adversary A, only 
the dummy adversary, D, that merely passes messages between the environment 
and the protocol’s parties. 

This natural requirement, however, has (seemingly inherent) difficulties when 
considering composition of protocols. In particular, we were not able to prove 
a general composition theorem in this model (see details in Section B). Conse- 
quently, we consider a more restricted notion of protocol emulation, which we 
term emulation with leakage-oblivious simulators. 

To simplify the exposition, we describe here leakage-oblivious simulation only 
with respect to the dummy adversary D. A leakage-oblivious simulator S for the 
dummy adversary has a special form: specifically, S has a separate subroutine S 
for handling leakage. When S receives from the environment a request to apply 
a leakage function L to a set P of processes, Š is invoked to produce a “state 
translation” function T. This function is meant to transform the internal state of 
P in the specification F into “the actual state” in the implementation 7. Once T 
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is produced, the aggregator is given the composed leakage function LoT. Finally, 
when the leakage-result is returned, it is forwarded directly to the environment 
and S returns to its state prior to the leakage event. 

The subroutine S$ should operate independently of the leakage function 
L, its only input is the state of S (prior to the leakage query) and a party 
identifier pid. Also, the leakage operation has no side effects on S. That is, 
following the leakage event S return to the state that it had before that event. 


3 Universal Composition of Leaky Protocols 


We now state the universal composition theorem for leaky protocols and leakage- 
oblivious simulators (as defined in Section2). Let 7 be an implementation and F 
be a specification. (As mentioned earlier, formally these are just two protocols, 
and the different names are meant only to help the intuition.) Also let p = p[7] 
be a protocol that includes subroutine calls to m. Below we denote by p” the 
system where the subroutine calls to 7 are actually processed by m and by p*/7 
the system where these subroutine calls are processed by F. 

The UC theorem states that if  UC-realizes F, then p” UC-realizes 
p’/™: however, that theorem does not hold in the presence of the modularity- 
breaking aggregator G. The proof of the UC-theorem in relies on all the 
processes being “modular”; namely, a process can only interact with its caller 
and its subroutines (and the adversary) [ 

As we have seen, modularity is incompatible with the definition of leaky pro- 
tocols; indeed, all processes are required to interact with the aggregator, which 
is neither their caller nor their subroutine (nor an adversarial entity). Still, if 7 
realizes F with a leakage-oblivious simulator, we can recover the same result. 
Below we call a protocol “modular up to leakage” if it only interacts with its 
caller, its subroutines, the adversary, and the aggregator. 


Theorem 3.1 (UC-composition with leakage). Let p,7,F be protocols as 
above, all modular up to leakage, such that n UC-emulates F with a leakage- 
oblivious simulator. Then p” UC-emulates p7/". Furthermore, it does so with a 
leakage-oblivious simulator. 


Proof Overview. The proof follows the outline of the proof of the basic UC 
theorem; here, we focus on the required adjustments due the leakage. For sake 
of simplicity, in this overview we assume that p invokes only a single instance of 
the sub-protocol 7. 

Recall that we need to construct a leakage-oblivious simulator S, such that no 
environment can tell whether it is interacting with p7” and the dummy adversary 
D, or with p*/ and Sp. The construction of S, is naturally based on the leakage- 
oblivious simulator S, as guaranteed by the premise. That is, S, runs a copy 
of Sz; as in the basic UC theorem, the interaction between Z and the parties 
is separated into two parts. The interaction with m is dealt with by S+, which 


1 Such protocols are also called “subroutine respecting”. 
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generates messages for the corresponding sub-parties and handles incoming mes- 
sages from these parties. The effect of the environment on rest of the system, is 
handled by direct interaction with the external parties running p. 

Leakage queries are handled by way of a subroutine Sp that generates a state 
translation T, as needed for leakage-oblivious simulation. Recall that the leakage 
function L that S, receives from the environment was designed to be applied 
to a “real protocol state” in p7 (and since p runs a single copy of m then this 
state is of the form (state,,state,)). The simulator Sp, on the other hand, can 
only ask the aggregator for leakage on the state of př, which is of the form 


(state,, state). To bridge this gap, S, runs the “state translation subroutine” 
Se (This can be done since S, has the entire current state of Sr.) Once Se 
produces a state translation function Tx, S, generates its own state translation 
function T, (state,, stater) = (state,, Tx (stater)) and sends to the aggregator a 


leakage function L’, where 
L' (state,,stater) = L(T, (state,,stater)) = L (state,, Tr (stater )) 


Observe that already at this stage we rely crucially on S, being leakage-oblivious: 
if S» was expecting to see a leakage function Ly (state, ) before producing the 
translation, then we could not use it (since S, does not know the state state,, and 
therefore cannot write the description of the induced function Lstate, (stater) = 
L (state,, state,)). Once the aggregator returns an answer, S, passes it to the 
environment and returns to its previous state (including the previous state of 
the sub-simulator S,). 

It is clear from the description that S, is leakage-oblivious. The validity of 
Sp is shown by reduction to the validity of S+. That is, given an environment 
Z, that distinguishes an execution of (p",D) from an execution of (p7/™,S,), 
we construct an environment Z, that distinguishes an execution of (m, D) from 
an execution of (F,S,). The environment Z, simulates an execution of (Zp, p) 
“in its head”, except that all messages corresponding to m are forwarded to the 
external execution. Indeed, leakage queries aside, we have: (a) if the external 
execution consists of S, and F, then the entire (composed) execution amounts 
to running Z, with S, and p7; (b) if the external execution consists of D and 
T, then the entire (composed) execution amounts to running Z, with D and p”. 

Extending this argument to include leakage, the environment Z, acts as fol- 
lows. When Z, produces a leakage query L to be evaluated on state,, stater, 
Z, computes the simulated state state, and computes the restricted leakage 
function Lstate, (stater) = L (statep, stater), which should be evaluated only on 
state, . Note that since S+ is leakage-oblivious, the state-translation function that 
it outputs when run as a subroutine of S, is the same as the state-translation 
function that it outputs when run with the environment Z,. The rest of the 
argument remains unchanged. 

The actual proof also deals with the case where multiple instances of the sub- 
routine 7 are invoked and can be found in the full version of this work [BCHII]. 
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4 From Adaptive Security to Leakage-Tolerance 


Recall that the adversary in the UC framework can adaptively corrupt parties 
during protocol execution, thereby learning their entire internal state. If the cor- 
ruption is passive (semi-honest), the party keeps following the same program as 
it did before the corruption, and if it is Byzantine (malicious), then the adversary 
also gains control of the program that the party runs from now on. 

As already pointed out, leakage can be thought of as a form of corruption, 
where the adversary gains partial information on the inner state of a party. 
The converse is also true, passive corruption can be viewed simply as leaking 
the entire internal state. The challenges in simulation are also similar: for both 
corruption and leakage the simulator must translate some “ideal state” that it 
gets from the functionality into a “real state” that it can show the environment, 
and do it in a way that is consistent with the transcript so far. Below we for- 
malize this similarity, showing that “in principle” a protocol that realizes some 
functionality F in the presence of passive adaptive corruptions also realizes it 
in the presence of leakage. There are considerable restrictions, however. Most 
importantly, the implication holds only for corruption-oblivious simulators (see 
below). Also, F must be adapted to handle leakage queries, and we prove the 
implication for a particular (natural) way of doing this adaptation. 


Adapting Functionalities to Leakage. Let F be functionality that was designed 
for a leakage-free model with corruptions. This means that F already has some 
mechanism to reply to messages from the adversary about corruptions of players. 
We now need to adapt it by explaining how it reacts to leakage queries from 
the aggregator G. The adaptation is natural: whenever G asks for the state of 
party pid for the purpose of leakage, the functionality replies with exactly the 
same thing that it would have given the adversary if pid was passively corrupted 
at this time. Then, once G reports the number of leakage bits, the functionality 
forwards this number on the I/O lines of party pid] Thereafter, the functionality 
behaves just as if party pid was passively corrupted. We denote the resulting 
functionality by Ft'*. We stress that if F was designed to react differently to 
passive and Byzantine corruptions, then it uses the passive corruption mode to 
handle leakage. 

Note the implication of viewing leakage as corruption: in principle, reaction 
to leakage could be gradual - a functionality F can change its behavior propor- 
tionally to the amount of leakage, or to have a leakage threshold up to which it 
does one thing and after which it does another. However, the reaction of F to 
(passive) corruption is typically “all or nothing”, it is either not affected or it 
completely “gives up”. Using our convention from above, this “all or nothing” 
reaction is carried over to Ft'*. For example, if F is an authenticated channels 
functionality, then F+'* will permit forgery as soon as even a single bit is leaked. 
On the other hand, if F is a commitment functionality then leakage events have 
no effect on the subsequent behavior of F+'*. Although the transformation that 


? This number-reporting action is meant to allow the environment to do its leakage 
bookkeeping, and for ideal functionalities to be able to react to leakage. 
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we prove below works for every functionality F, its usefulness depends crucially 
on the way F handles passive corruptions. 

Corruption-Oblivious Simulators. The intuition for why adaptive corruption im- 
plies leakage-tolerance is that if we can simulate the entire state of an adaptively 
corrupted party, then we should also be able to simulate only parts of its state 
(according to a particular leakage function). The problem with this intuition, 
however, is that future behavior of the simulator may depend on the entire state 
learned during corruption, which is not available to the leakage simulator. 

We thus restrict our attention to special simulators that are oblivious of the 
state learned during corruption (similarly to the leakage-oblivious simulators 
from Section 2). As for leakage, we only define corruption-oblivious simulators 
for the dummy adversary D (which is sufficient). The simulator S for D should 
have a special subroutine Š for handling passive corruptions. When S receives 
from the environment a request (passive corrupt, pid) to passively corrupt a party 
pid, S invokes S to produce a state translation function T. T is used to transform 
the “internal state” that F (or the hybrid-world protocol) returns for party pid 
into a state of the “real world” implementation protocol m for this party. Then, 
S sends a passive corrupt message to pid, obtains the corresponding state (from 
F or the hybrid-world instance), applies to it the transformation T and returns 
the result state, = T (state) to the environment. After the result is forwarded to 
the environment, S returns to its state prior to the time it invoked S. 

Note that since this is passive corruption, then party pid can keep evolving its 
state after the initial corruption, and the environment can ask to see the updated 
state from time to time. S handles each such update request as a new passive 
corruption query, invoking S again to get state-translation function, calling the 
functionality again, etc. (We note that there is no restriction on the way that S 
handles Byzantine corruptions.) 

We stress that S does not make any direct use of the state of the corrupted 
parties. In particular, the future operation of S, when simulating the messages 
generated by corrupted parties, is done independently of their secret local states. 
As seen in subsequent sections, in some cases this turns out to be a strong 
restriction (see Example [5-J]in Section B). We are now ready to state the main 
result of this section. The proof is provided in the full version [BCH1]]. 


Theorem 4.1. Let 7 be a protocol that UC-realizes an ideal functionality F in 
the presence of passive adaptive corruptions (but no leakage), with a corruption- 
oblivious simulator. Then m also UC-realizes F+"* with a leakage-oblivious sim- 
ulator in the UC model with leakage. 

Composition of Corruption-Oblivious Simulators. We note that, viewing 
corruption-oblivious simulators as a special case of leakage-oblivious simulators 
(for leaking the identity function), the proof of the leaky UC TheoremB.1]implies 
that corruption-oblivious simulation is preserved under universal composition: 


Corollary 4.1 (of Theorem [8.1). Let p,2,F be protocols that are modular 
up to leakage, such that n UC-emulates F with a corruption-oblivious simulator. 
Then p" UC-emulates p”/" with a corruption-oblivious simulator. 
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5 Realizing Leaky Adaptations of Basic Interactive Tasks 


This section describes the construction of leakage-tolerant protocols for for sev- 
eral interactive tasks. We describe constructions for secure message transmis- 
sion, (semi-honest) oblivious transfer, commitment and zero-knowledge. These 
constructions all assume ideal authenticated channels. We then present a con- 
struction of leakage-resilient authenticated channels. All of our constructions 
are composable. We conclude the section with a discussion on the difficulties in 
obtaining general leakage-tolerant multi party computation. 

The bulk of this section is omitted. It can be found in the full version of this 
work [BCH1]]. Here, we only sketch the constructions for the last two tasks. 


5.1 Zero-Knowledge from Ideal Leaky Commitments 


We adapt the zero-knowledge ideal functionality to tolerate leakage and demon- 
strate a protocol that realizes the adapted functionality in the presence of leak- 
age. Recall that Fzx.r, for a relation R, takes from the prover P an input (x, w), 
and outputs z to the verifier V only if R(x, w) holds. This formulation guarantees 
to P perfect secrecy of w. It also guarantees perfect soundness to V. 

Adapting Fz, to leakage, we can ideally hope to realize a functionality with 
optimal tolerance, such as FAK, which can “gracefully” tolerate arbitrary leak- 
age from the prover, and in addition does not give up on soundness even in face 
of arbitrary leakage on the verifier. However, we could not manage to realize 
such a functionality. Instead, we consider an adaptation that can tolerate arbi- 
trary leakage from the prover, but only a bounded amount of leakage from the 
verifier before soundness breaks. Before presenting our eventual adaptation and 
implementation, we briefly sketch the difficulties which prevent us from achieving 
optimal leakage-tolerance. 

As shown in [CLOS02], the parallel repetition of classic 3-round zero- 
knowledge protocols, such as Blum’s Hamiltonian cycle [Blu86], and GMW’s 
3-coloring [GMW91], UC-realizes the basic (non-leaky) Fz, given access to 
(non-leaky) ideal commitment. Moreover, they do so even in the presence of 
adaptive corruptions. However, the proofs of security of these protocols do not 
yield corruption-oblivious simulation. Thus, we cannot conclude that these pro- 
tocols UC-realize Fic under leakage. 

In fact, without any modifications, these protocols seem inherently impossi- 
ble to simulate in the face of leakage. To demonstrate this, let us recall GMW’s 
3-coloring protocol. Here, the prover, who possesses a 3-coloring c, chooses a 
random permutation o of the three colors and commits to the permuted color- 
ing o(c). The verifier then requires that the prover opens the colors of a random 
edge and checks that its endpoints are indeed colored differently. Now, consider 
a (Byzantinely) corrupted verifier V* that also obtains leakage on the prover’s 
coloring during the protocol. This verifier can leak, for example, the secret per- 
mutation ø and then ask the (honest) prover to open the colors o(c(7)), a(c(j)) 
of some random edge (i, j). Finally, it can leak again the true colors c(i), c(j). 
Simulating such a behavior seems impossible (assuming 3COL ¢ BPP). Indeed, 
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once the simulator simulates ø for the first leakage, it essentially becomes com- 
mitted to it for the rest of the protocol. Then, when it is required to simulate 
the opening of o(c(i)), o(c(7)), it essentially has no information on c, and hence, 
if it can consistently simulate the second leakage query, then essentially it must 
“know” a proper coloring of the entire graph}? We stress that this inherent dif- 
ficulty also fails simulators that are not leakage-oblivious (and are thus allowed 
to depend on both the leakage function and the leakage-result). 

To overcome the above problem, we require that at the beginning of the proto- 
col, the verifier commits to all its challenges. This already allows the simulation 
to go through; now the simulator can first extract the challenge edge (i, j), choose 
random colors for it c'(i),c' (j), and then have the leakage return a permutation 
mapping the real c(i),c(j) to c'(i),c' (j). In fact, we show that this adjustment 
is enough for simulating any malicious verifier. 

This adjustment comes, however, at a price: unlike the original protocols, 
where the verifier was of the public coins type (and had no secret state), now 
the verifier commits to its challenges, and the secrecy of these challenges is crucial 
for the protocol’s soundness. Hence, we cannot hope that in such a protocol the 
verifier will be able to withstand arbitrary amounts of leakage; in particular, 
once the prover leaks all of the verifier’s challenge, soundness is doomed. 

Consequently, we only realize a weaker adaptation, where the verifier can only 
tolerate a bounded amount of leakage. (The prover can still tolerate arbitrary 
leakage.) More specifically, we can tolerate arbitrary leakage on the verifier’s ran- 
domness so long that a super-logarithmic amount of min-entropy is maintained. 


5.2 Authenticated Channels 


We construct a protocol for realizing leaky authenticated channels with bounded 
leakage-resilience. More specifically, the protocol UC-realizes an ideal function- 
ality Fee that guarantees authenticated communication as long as the overall 
leakage between any two transmissions of some messages does not exceed a pre- 
specified bound B. 

The protocol we present uses two main building blocks: (a) non-committing 
encryption (NCE) (b) information theoretic c-time message authentication codes 
(MACs) that are resilient to a constant leakage rate from the secret key. The 
idea behind the protocol is simple. The parties initially share a (leaky) secret 
key Kı. Then the protocol proceeds inductively; at each round, a current au- 
thentication key K; is used to authenticate the i-th message, m;. In addition, 
a fresh key Ky41 is generated and transmitted using non-committing encryp- 
tion. These transmitted ciphers are also authenticated using K;. To allow the 
authentication to go through, we need our underlying leaky MAC scheme to 
allow authentication of messages that are polynomially longer than the secret 
key. This is achieved using universal hashing. Concretely, the protocol we present 
tolerates, between each two transmissions, roughly k/10 bits of leakage on the 


3 This intuition can be made formal; namely, given such a simulator we can construct 
an algorithm for 3-coloring arbitrary graphs. 
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k-long secret key. Similar techniques are used for a related goal in [BCG*1)]. 
However, the security analysis there is different than the one here. 

The protocol we construct admits leakage-oblivious simulation and is thus 
composable. We can, therefore, use it as a basic building block supporting any 
protocol that requires authenticated channels, when ideally authenticated chan- 
nels are unavailable. We stress, however, that when doing so the leakage-tolerance 
of the higher-level protocol, naturally degrades to that of the authentication 
protocol. 


5.3 On the Difficulty in Achieving General Leakage-Tolerant MPC 


Equipped with Theorem we may hope that, similarly to the tasks consid- 
ered above, general leakage-tolerant multi-party computation (MPC) would also 
follow from known results on adaptively secure MPC (such as, ). Un- 
fortunately, known results do not admit corruption-oblivious simulation and are 
in fact far from being leakage-tolerant. We exemplify the relevant difficulties by 
giving a protocol that is adaptively secure but not leakage-tolerant. Although 
seemingly contrived, the protocol suffers from the same caveats that fail known 
adaptively secure protocols from achieving leakage-tolerance. 


Example 5.1. Let F be a standard corruption functionality that takes n-bit in- 
puts from two parties, Pọ and Pı, and outputs nothing. As soon as party P; 
provides input x;, the virtual local state of P; is set to x;. Now, consider the fol- 
lowing protocol r: first, the parties give their inputs to some trusted party that 
returns a random b; to P; such that bo + bı = (£0, 71) where (,) denotes inner- 
product in F2. (The inner product can be replaced by any two-source extractor.) 
Next, the parties output nothing and halt. 

It can be seen that m securely realizes F with respect to adaptive corruptions. 
This is so since, once the first party P; is corrupted, the simulator learns x; and 
can give x; to the adversary, plus a random bit instead of b;. Now, when P\_; 
is corrupted, the simulator learns z1—; and can determine the bit b,_; so that 
bo + bı = (9,21). However, notice that here the simulator is not corruption- 
oblivious: the handling of the second corruption depends on the input value z; 
of the first corrupted party. Indeed, m does not realize Ft'* with even one bit 
of leakage from each party: the adversary can ask to leak b; from P; and thus 
learn (29,21). However, in the ideal model for Ft'*, assuming zo, zı are long 
random strings, the simulator has no hope of learning (zo, £1}. This is so since 
in the ideal model, the simulator can only perform one-bit leakage on zo and x1 
separately, and hence it can not guess (29,21) with non-negligible advantage. 


Indeed, the same problem would arise in GMW-based protocols, where the value 
of each wire is secret-shared between the parties in a non leakage-resilient manner 
as above. This is actually also the case for YAO-based adaptively secure protocols 
(for NC, functions); there also (although not explicitly), the value of each wire 
is effectively secret-shared between the parties in a non leakage-resilient way. 

Weak (Joint-State) Leakage-Tolerance Vs. Strong  (Separate-State) 
Leakage-Tolerance. Note that, had we modified F+'* in the above example so 
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that the virtual local state of each party includes both inputs, the above proto- 
col would UC-realize F*"* with leakage. More generally, if we settle for a weaker 
leakage-tolerance guarantee where the ideal world simulator can jointly leak from 
the inputs and outputs of all parties (and not only separately from the inputs 
and outputs of each leaking party alone), then leakage-tolerance can already be 
achieved. In fact, combining our leakage-tolerant OT protocol with an adaptively 
secure protocol, such as GMW, it is easy to obtain semi-honest MPC for general 
functions. (We note that this, in particular, concerns the two party setting where 
one party is statically corrupted considered by [DHPII], which can be seen as a 
special case of weak leakage-tolerance.) 

However, in a setting where real world adversaries are restricted to separate 
leakage from each party, an ideal process that allows joint leakage from the inter- 
nal states of the parties is somewhat unsatisfactory. Achieving strong (separate- 
state) leakage-tolerant MPC in general (without preprocessing or limitations on 
the number of honest parties) remains an interesting open question. 


Acknowledgments. We thank Amit Sahai for telling us about the problems 
with proving leakage-tolerance of the standard three round zero-knowledge pro- 
tocols and about the way this problem is solved in |GJS11). 
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Abstract. We show that the Feistel construction with six rounds and 
random round functions is publicly indifferentiable from a random in- 
vertible permutation (a result that is not known to hold for full indiffe- 
rentiability). Public indifferentiability (pub-indifferentiability for short) 
is a variant of indifferentiability introduced by Yoneyama et al. and 
Dodis et al. where the simulator knows all queries made by the dis- 
tinguisher to the primitive it tries to simulate, and is useful to argue the 
security of cryptosystems where all the queries to the ideal primitive are 
public (as e.g. in many digital signature schemes). To prove the result, 
we introduce a new and simpler variant of indifferentiability, that we call 
sequential indifferentiability (seq-indifferentiability for short) and show 
that this notion is in fact equivalent to pub-indifferentiability for state- 
less ideal primitives. We then prove that the 6-round Feistel construction 
is seq-indifferentiable from a random invertible permutation. We also ob- 
serve that sequential indifferentiability implies correlation intractability, 
so that the Feistel construction with six rounds and random round func- 
tions yields a correlation intractable invertible permutation, a notion 
we define analogously to correlation intractable functions introduced by 
Canetti et al. [4]. 


Keywords: indifferentiability, correlation intractability, Feistel 
construction. 


1 Introduction 


Indifferentiability. Indifferentiability has been introduced by Maurer et al. [22] 
as a generalization of the concept of indistinguishability for systems using public 
components (i.e. components that can be queried by any party including the 
adversary). This framework has since then gained much popularity, and starting 
with |7| it has been widely used to analyze hash functions built from a smaller 
ideal primitive, e.g. a fixed input-length (FIL) random compression function 
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or an ideal block cipher. Informally, a construction C using an ideal primitive 
F (e.g. a hash function based on a FIL random compression function) is said 
to be indifferentiable from another ideal primitive G (e.g. a random oracle) if 
there exists a simulator S accessing G such that the two systems (G,S@) and 
(CF, F) are indistinguishable. Roughly, the goal of the simulator is twofold: it 
must provide answers that are consistent with G, without deviating too much 
from the distribution of answers of F. Indifferentiability allows modular proofs 
of security in idealized models in the sense that if a construction C is indiffe- 
rentiable from an ideal primitive G, then any cryptosystem proven secure when 
used with G remains secure when used with the construction CF |H For example, 
if a cryptosystem is secure in the random oracle model, and some hash function 
construction Hf based on a FIL random compression function f is indifferen- 
tiable from a random oracle, then the cryptosystem is still secure when used 
with Hf. More interestingly from a theoretical point of view, Coron et al. 
showed that a number of variants of the Merkle-Damgard construction, used 
with an ideal cipher in Davies-Meyer mode, are indifferentiable from a random 
oracle. This implies that any functionality that can be securely implemented in 
the random oracle model can also be securely realized in the ideal cipher model. 


The Feistel Construction with Public Round Functions. The Feistel con- 
struction turns a function F from n-bit strings to n-bit strings into an (effi- 
ciently invertible) permutation on 2n-bit strings. It is computed as YW" (L, R) = 
(R, L ® F(R)). In their seminal paper [I8] which triggered a lot of subsequent 
work [20/23/24]28] , Luby and Rackoff showed that three (resp. four) rounds of the 
Feistel construction, with independent pseudorandom functions in each round, 
yields a pseudorandom permutation (resp. strong pseudorandom permutation). 
The core of this result is in fact purely information-theoretic [20], meaning that 
the Feistel construction with three (resp. four) rounds and random round func- 
tions is indistinguishable from a random permutation (resp. an invertible random 
permutation) by any computationally unbounded distinguisher limited to a poly- 
nomial number of oracle queries. The Luby-Rackoff theorem crucially relies on 
the secrecy of the round functions. A few papers studied what happens when 
the round functions are made public. In particular, Ramzan and Reyzin [25] 
have shown that the Feistel construction with four rounds remains strongly 
pseudorandom even when the distinguisher has oracle access to the two middle 
round functions (but not to the first or the fourth round function). Dodis and 
Puniya [II] have studied various properties of the Feistel construction (unpre- 
dictability, pseudorandomness) when all intermediate round values of the Feistel 
computation are leaked to the adversary and shown that in that case a super- 
logarithmic number of rounds was necessary and sufficient for the property to 
be inherited by the Feistel construction from the round functions. 


Indifferentiability of the Feistel Construction. As already mentioned, it 
is possible to securely instantiate a random oracle in the ideal cipher model. 


1 Tt was recently pointed out that this composition theorem only holds for cryptosys- 
tems whose security is defined by so called single-stage games |26}. 
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A natural question is whether the other direction holds, namely whether there 
is a construction using a random oracle that securely implements a random 
invertible ae Given its numerous cryptographic properties, the Feis- 
tel construction (with public random round functions) appears as an obvious 
candidate for this task. Again, this question can be rigorously formulated in 
the indifferentiability framework: namely, is the Feistel construction with suffi- 
ciently many rounds, and public random round functions, indifferentiable from a 
random invertible permutation? Dodis and Puniya [10] considered the problem 
in the so-called honest-but-curious model, where the distinguisher only sees the 
queries made by the Feistel construction to the random round functions, but is 
not allowed to make arbitrary queries to the round functions. In this setting, 
they showed that a super-logarithmic number of rounds is sufficient to securely 
realize a random invertible permutation. However, since full indifferentiability 
is not implied in general by indifferentiability in the honest-but-curious model 
(these two notions are in fact incomparable [9]), they were not able to conclude 
in the general setting. Coron, Patarin, and Seurin [9] gave a first proof that the 
Feistel construction with six rounds is indifferentiable from a random invertible 
permutation. The proof was rather involved, and Kiinzler later found a dis- 
tinguishing attack against the simulator given in [9], therefore invalidating the 
indifferentiability proof] Only recently, Holenstein et al. gave a new proof 
that the Feistel construction with fourteen rounds is indifferentiable from a ran- 
dom invertible permutation, which was inspired from a previous proof for ten 
rounds that appeared in the PhD thesis of Seurin but had some gaps. 


Public Indifferentiability. Yoneyama et al. [29] and Dodis et al. [I2] indepen- 
dently realized that indifferentiability was sometimes stronger than needed to ar- 
gue security of cryptosystems. In particular, when all queries made to the ideal 
primitive are public (like in many digital signature schemes such as FDH [3], 
probabilistic FDH [6], PSS [3]..., where all queries to the hash function can 
be revealed to the attacker without affecting the security), the weaker notion of 
public indifferentiability is sufficient. [Z912] were both concerned with indifferen- 
tiability from a random oracle and respectively called this notion leaky random 
oracle and public-use random oracle. Public indifferentiability is defined simi- 
larly to indifferentiability, but the task of the simulator is made easier by letting 
it know all queries made by the distinguisher to the ideal primitive G. 


Correlation Intractability. Correlation intractability was introduced by Ca- 
netti et al. |4| as an attempt to capture as many security properties of the random 
oracle as possible. A family of functions is said to be correlation intractable if 
for a random function of the family it is hard to find a sequence of inputs that 
together with their image satisfy a relation that would be hard to satisfy for a 


? Such a construction easily implies a secure ideal cipher by simply prepending the 
key of the block cipher to the input of each random oracle queries. 

3 We stress that this does not mean that the 6-round Feistel construction is not in- 
differentiable from a random invertible permutation, but only that no one is able to 
give a proof at the moment. 
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uniformly random function (a so-called evasive relation). Correlation intractabil- 
ity in particular implies collision resistance, pre-image resistance and many other 
security properties usually required for cryptographic hash functions. Unfortu- 
nately, Canetti et al. also showed that in the standard model, no correlation 
intractable hash function family exists. A consequence of this non-existence re- 
sult is that there are cryptosystems that are secure in the random oracle model, 
but insecure when the random oracle is instantiated by any function family. 
Though correlation intractability was primarily defined in the standard model, 
it is easily transposable to idealized models. As we will see our result establishes 
a connection between correlation intractability and public indifferentiability. 


Contributions of This Work. We define a new and weaker notion of indif- 
ferentiability that we call sequential indifferentiability (seq-indifferentiability for 
short). This new definition only restricts the order in which the distinguisher 
can query the two oracles it is granted access to: it can first query the primitive 
F (or the simulator S), and then the construction C® (or the ideal primitive G), 
but not F/S again. We show that when the ideal primitive G is stateless (which 
is the most usual case), this notion is equivalent to public indifferentiability in- 
troduced by [12[29] where all queries to the primitive G are public. However the 
seq-indifferentiability notion has the advantage of being simpler and easier to use 
in proofs. This simple restriction on the queries of the distinguisher enables to 
give a relatively simple proof that the 6-round Feistel construction with random 
round functions is seq-indifferentiable (and hence also publicly indifferentiable) 
from a random invertible permutation, a result whose analogue for full indiffe- 
rentiability seems out of reach at the moment. Our result in particular implies 
that any scheme proven secure in the random invertible permutation model or 
the ideal cipher model and where all queries to the ideal primitive can be made 
public without affecting the security (e.g. signature schemes like OPSSR [I3] and 
subsequent variants [15]5]) remains secure in the random oracle model when us- 
ing a 6-round Feistel construction (while the best generic replacement previously 
to our work was the 14-round Feistel construction ). 

Though weaker than full indifferentiability, we also show that seq-indifferen- 
tiability is still sufficiently strong to imply correlation intractability. In particu- 
lar, our result shows that the 6-round Feistel construction with random round 
functions yields a correlation intractable invertible permutation (we note that 
previous observations [9] already implied that the 5-round Feistel construction 
fails to provide a correlation intractable invertible permutation). We discuss the 
implications of this result for chosen-key and known-key attacks on block ci- 
phers [16]. 

On a slightly different topic, we also analyze the Feistel-like domain extension 
construction for ideal ciphers proposed by Coron et al. [8] and show that in 
the seq-indifferentiability model one can obtain a security bound beyond the 
birthday barrier. See the full version of the paper [I9]. 


Open Problems. The most challenging open question is of course whether 
the 6-round Feistel construction is fully indifferentiable from a random invert- 
ible permutation, and if not, what is the minimal number of rounds needed to 


On the Public Indifferentiability and Correlation Intractability 289 


achieve this property. We hope that our result will constitute a first step to- 
wards a finer understanding of this question. In particular, our result implies 
that if the 6-round Feistel construction is not fully indifferentiable from a ran- 
dom invertible permutation, then this cannot be shown by proving that it is not 
correlation intractable as was done for five rounds. Another interesting problem 
is to weaken the assumptions on the round functions and see which property 
would continue to hold: e.g. is the 6-round Feistel construction with correlation 
intractable round functions still a correlation intractable invertible permutation? 
A related question is whether our result could be a first step towards proposing 
plausible constructions of (restricted) correlation intractable function families in 
the standard model, a question left open by [4| Section 5.1]. 


Organization. In Section 2] we start by giving the definition of sequential in- 
differentiability and prove that it is equivalent to public indifferentiability for 
stateless ideal primitives. In Section [B] we prove the main result of this paper, 
namely that the 6-round Feistel construction is sequentially (and hence pub- 
licly) indifferentiable from a random invertible permutation. In Section [4] we 
apply this result to prove the correlation intractability of the 6-round Feistel 
construction. 


2 Preliminaries 


2.1 Notations and Definitions 


Notations. [i..j] will denote the set of integers k such that i < k < j. We will 
use n to denote the security parameter, and in sections dealing with the Feistel 
construction we will identify n with the input and output length of the round 
functions. We will write f E€ poly(n) to denote a polynomially bounded function 
and f € negl(n) to denote a negligible function. When ¥ is a non-empty finite 
set, we write x —R Æ to mean that a value is sampled uniformly at random 
from ¥ and assigned to x. PPT will stand for probabilistic polynomial-time, and 
ITM for interactive Turing machine. 


Ideal Primitives. Given two sets Dom C {0,1}* and Rng C {0,1}*, we denote 
F (Dom, Rng) the set of all functions from Dom to Rng. A primitive G is a sequence 
G = (Domn, Rng,,,Gn)nen where Gn C F(Dom,,Rng,,). The ideal primitive G 
associated with G is the sequence of random variables (Gn)nen where Gn is 
uniformly distributed over Gn. We will often adopt the lazy sampling view [I] 
to describe ideal primitives queried as oracles. 

A random function F = (F;,)nen is the ideal primitive associated to the set of 
all functions from {0,1}” to {0,1}”. Queried as an oracle it returns a uniformly 
random string in {0,1}” if z was never queried, or the same answer as before if 
x was previously queried. 

A random invertible permutation P = (P,,)nen is the ideal primitive associ- 
ated with the sequence P = (Domn, Rng,,, Pn)nen where Dom, = {0,1} x {0,1}”, 
Rag, = {0,1}”, and P, is the set of functions P such that x + P(0,x) is a 
permutation of {0,1}", and y+ P(1,y) its inverse. Queries of the form (0, x) 
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and (1, y) will be called respectively forward and backward queries. In the lazy 
sampling point of view, P, keeps two lists Ly and Ly of forward and backward 
queries whose image is already defined together with an invertible mapping from 
Lz to Ly. Upon receiving a forward query (0, x) such that x ¢ Ly it returns an 
answer y uniformly random over {0,1}” \ Ly, and adds x to Ly and y to Ly 
and updates the mapping (and reciprocally for a backward query (1, y)). Later, 
we will occasionally refer to Lẹ and Ly as the history of the random invertible 
permutation. An ideal cipher E = (E,,) takes an additional input, the key, of 
length (n), and for each key k € {0,1}, E,,(k,-) is an independent random 
invertible permutation over {0,1}”. 

A two-sided random function on {0,1}", denoted Rn, is very similar to a 
random invertible permutation. It also keeps to lists Lz and Ly together with 
an invertible mapping from Lg to Ly. However when receiving a forward query 
(0,2) such that x ¢ Lẹ or a backward query (1,y) such that y ¢ Ly, it re- 
turns a uniformly random answer in {0,1}”. In case a collision happens, the 
previous image or pre-image is removed from Ly or Lz and the mapping is up- 
dated accordingly. Note that a two-sided random function is stateful: it may 
return different answers to the same query (however at any time it defines an 
invertible mapping from L, to Ly). A two-sided random function is statistically 
indistinguishable from a random invertible permutation: the so called PRF/PRP 
switching lemma [I] established4 that an oracle machine making at most q oracle 
queries can distinguish P, from R, with advantage at most qg?/2"*?. 

In the following, we omit the subscripts when the domain and the range of 
an ideal primitive are clear from the context. A construction will simply be a 
Turing machine having oracle access to an ideal primitive and implementing 
another given primitive. The main construction we will consider in this work is 
the Feistel construction. 


The Feistel Construction. Given a function F : {0,1}" — {0,1}”, the ba- 
sic (1-round) Feistel construction is the permutation on {0,1}?”" defined by 
WL, R) = (R,L © F(R)). Its inverse is computed by (¥")-1(S,T) = (T 
F(S), S). (Here L, R, S, and T are n-bit strings). The k-round Feistel construc- 
tion associated to round functions (F},..., Fk) takes inputs z € {0,1} x {0,1}?” 
and is defined by: 


yr (0, (L, R)) = 0 0--- 0 WF (L, R) 

pE), (S, T)) = (wh) Ste (wr) (5, T) . 
Notations used for denoting the intermediate round values for the 6-round Feistel 
construction are given in Figure[[] In the following, when considering the Feistel 


construction using k independent random functions, we will simply note F = 
(F\,...,F,) this tuple of functions and YF = yh 2 Fis) 


t Strictly speaking, the result is proven in [I] for one-sided functions and permutations, 
but the proof can be straightforwardly adapted to two-sided primitives. 
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< F; K 


Fig. 1. Notations used for the 6-round Feistel construction 


2.2 Sequential Indifferentiability 


Indifferentiability was originally formulated within the formalism of random sys- 
tems [21]. We adopt here the simpler formulation using interactive Turing ma- 
chines as in [7]. We first recall the classical definition of indifferentiability [22]. 
For this, we slightly change the way one usually measure the cost of queries of 
a distinguisher (this will make our results simpler to express). Given a distin- 
guisher D, the total oracle queries cost of D is the number of queries received 
by the oracle F when D interacts with (C¥, F). Hence this is the sum of the 
number of direct queries of D to F and the number of queries made by C to F 
to answer D’s queries. 


Definition 1 ((Statistical, Strong) Indifferentiability). Let q,0o : N— N 
and e : N—R be three functions of the security parameter n. A construction C 
with oracle access to an ideal primitive F is said to be statistically and strongly 
(q, 0, €)-indifferentiable from an ideal primitive G if there exists an oracle ITM 
S such that for any distinguisher D of total oracle queries cost at most q, S 
makes at most o oracle queries, and the following holds: 


[Pr pera) = 1| -Pr pama = 1] | <e. 


CF is simply said to be statistically and strongly indifferentiable from G if for any 
q E€ poly(n), the above definition is fulfilled with o E€ poly(n) and € € negl(n). 


Definition[I]does not refer to the running time of S and D. When only polynomial- 
time algorithms are considered, indifferentiability is said to be computational. 
Weak indifferentiability is defined as above, but the order of quantifiers for the 
distinguisher and the simulator are switched (for all distinguisher, there is a sim- 
ulator...). We will mainly be concerned with statistical strong indifferentiability 
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in this work, but we note that weak indifferentiability is sufficient for our results 
on correlation intractability in Section [4] 

In order to define our new notion of indifferentiability, we will consider a 
restricted class of distinguisher, called sequential distinguisher, which can only 
make queries in a specific order. Such a distinguisher first queries the primitive F 
(or the simulator S) as it wishes, and then the construction CF (or the primitive 
G) as it wishes, but after its first query to C or G, it cannot query S or F 
again. Sequential indifferentiability (seq-indifferentiability for short) is defined 
relatively to such distinguishers. 


Definition 2 (Seq-indifferentiability). A construction C with oracle access 
to an ideal primitive F is said to be (statistically and strongly) (q, 0, €)-seq- 
indifferentiable from an ideal primitive G if Definition [O] is fulfilled when D 
ranges over the class of sequential distinguishers. 


Full indifferentiability obviously implies seq-indifferentiability. Yoneyama 
et al. [29] and Dodis et al. [12] have introduced another weakened notion of 
indifferentiability, where the primitive G is only queried on public inputs, that 
we call here public indifferentiability (pub-indifferentiability for short). This can 
be formalized as follows: given an ideal primitive G, we define the augmented 
ideal primitive G as the primitive exposing two interfaces: the first (regular) one 
is the same as G, and the second is an interface Reveal that, when queried, 
returns the ordered sequence of all (regular) queries and corresponding answers 
made so far by any party to the regular interface. The second interface can only 
be used by the simulator, not by the distinguisher. 


Definition 3 (Pub-indifferentiability). A construction C with oracle access 
to an ideal primitive F is said to be (statistically and strongly) (q, 0, €)-pub- 
indifferentiable from an ideal primitive G if there exists an oracle ITM S such 
that for any distinguisher D of total oracle queries cost at most q, S makes at 
most o oracle queries, and the following holds: 


[Pr pasran = 1] -Pr orr = 1| | Le, 


As explained in [12], the composition theorem of [22] still holds with pub-indif- 
ferentiability for cryptosystems where all messages queried to G can be inferred 
from the adversary’s query during the security experiment. 

Clearly, pub-indifferentiability implies seq-indifferentiability. Indeed, since af- 
ter its first query to G a sequential distinguisher never queries the simulator 
again, the interface Reveal is of no use to the simulator. A less trivial_result 
is that seq-indifferentiability implies pub-indifferentiability for stateles$] ideal 
primitives G, thus making seq- and pub-indifferentiability equivalent notions in 
that case. 

5 By stateless we mean that the answer of G to any query only depends on the query 


and the randomness of G and not on any additional state information. In particular, 
for fixed randomness, G always returns the same answer to a given query. 
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Theorem 1. Let C be a construction with oracle access to some ideal prim- 
itive F. If CF is statistically (resp. computationally) strongly (2q, 0, €)-seq-in- 
differentiable from a stateless ideal primitive G, then C¥ is statistically (resp. 
computationally) strongly (q,o + q,€)-pub-indifferentiable from G. 


Proof. See the full version of the paper [19]. 


Ristenpart{) observed that the above theorem does not hold (at least in the com- 
putational setting) when G is stateful. This is explained in the full version of 
the paper [I9]. A very simple example enables to separate full indifferentiability 
from seq/pub-indifferentiability, namely the Merkle-Damgard construction with- 
out strengthening using a random compression function: it was proven in [7| that 
it is not indifferentiable from a random oracle (a consequence of length-extension 
attacks), and in [12] that it is pub-indifferentiable from a random oracle. 


3 Seq-Indifferentiability of the 6-Round Feistel 
Construction 


In this section we prove the main result of this paper which states that the Feistel 
construction with 6 rounds and random round functions is seq-indifferentiable 
from a random invertible permutation, and hence also pub-indifferentiable since 
a random invertible permutation is stateless. Before stating the result, we recall 
that in [9], it was shown that the Feistel construction with five rounds is not 
indifferentiable from a random invertible permutation. In fact, the distinguisher 
they described is sequential, which implies that the 5-round Feistel construction 
is not even seq-indifferentiable from a random invertible permutation. We recall 
this attack in the full version of the paper [I9]. 


Theorem 2. The Feistel construction with six rounds and random round func- 
tions is statistically and strongly (q, o, €)-seq-indifferentiable from a random in- 
vertible permutation, where: 


2 


sf gt 
eld) = or gn * 


ol) = and 
The rest of this section is devoted to the proof of Theorem P] We will consider a 
sequential distinguisher D that first issues at most qf queries to the simulator (or 
the random functions F;). These queries will be called F-queries. Then, it issues 
at most qp queries to the random permutation P (or the Feistel construction Y`). 
These queries will be called P-queries. The total oracle queries cost is qf + 6qp 
(for each P-query, the Feistel construction makes 6 F-queries to compute the 
answer) and is assumed to be less than q. 

We start by describing how the simulator S works. It maintains an history of 
values for which each round function has been defined (either because this value 
has been queried by the distinguisher, or because the simulator has set this value 


6 Personal communication. 
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internally). We will note F;, i € [1..6] the history of the i-th round function, that 
is a set of pairs (U,V) € {0,1}” x {0,1}”, where U is an input to round function 
F; and V is the corresponding image (which we denote F;(U) = V). We write 
U e€ F; to denote that the image of U by F; is defined in the history. Initially 
round function values F;(U) are undefined for all i € [1..6] and all U € {0,1}”. 
The images are then modified during the execution of the simulator. F;(U) — V 
means that the image of U by F; is set to V and F(U) —R {0,1}” means that 
the image of U by F; is set uniformly at random in {0,1}”. If a round function 
value is already in the history and a new assignment occurs, the previous value 
is overwritten (alternatively, we could let the simulator abort in this case, as 
in [9], but as we will see this happens only with negligible probability so that 
the exact behavior of the simulator in such a case in unessential). We will note 
H = (F\,..., Fẹ) the complete history of the six round functions. 

When the simulator receives a F-query (i, U) (meaning that the distinguisher 
asks for the image of U through round function F;), it calls an internal procedure 
Query(i,U). This procedure checks whether the corresponding image is in the 
history of F;, in which case it returns this value and stops. Otherwise it sets the 
image uniformly at random. If i = 1, 2, 5, or 6, it does nothing more. If i = 3 or 
4, the simulator additionally completes all centers (Y, Z) € F3 x F4 newly created 
so that the corresponding values of (L, R) and (S,T) obtained by evaluating the 
Feistel construction respectively backward and forward are consistent with the 
random permutation P, meaning that P(0, (L, R)) = (S, T). This is done by call- 
ing two internal procedures CompleteForward (if i = 4) or CompleteBackward 
(if i = 3) which “adapts” two round function values (F5(A) and F¢6(S) for 
CompleteForward, and F\(R) and F2(X) for CompleteBackward) so that the 
Feistel matches with the random permutation. The pseudo-code for the three 
procedures is given below. Statements put in boxes in CompleteForward and 
CompleteBackward are replacements for a different system used in the indiffe- 
rentiability proof and can be ignored for the moment. 

There are two points to prove in order to obtain Theorem[] that the simulator 
runs in polynomial time, and then that the probabilities that the distinguisher 
outputs 1 when interacting with (P, SP) and (YE, F) differ by a negligible quan- 
tity e. The following lemma shows that the simulator runs in time polynomial 
in the number of queries it receives. 


Lemma 1. When the simulator is asked at most q queries, then the size of 
histories for F3 and F is at most q, the size of histories for F,, Fo, F5 and Fẹ 
is at most q? + q, the procedures CompleteForward and CompleteBackward are 
called in total at most q? times, and the simulator makes at most q? queries to 
the random permutation. 


Proof. Elements are added to the history of F; and F4 only when a correspond- 
ing F-query is made to the simulator, so that the size of their history cannot be 
greater than q. For each pair (Y, Z) € F3 x F4, either CompleteForward(Y, Z) 
or CompleteBackward(Y, Z) is called, at most once, so that in total these pro- 
cedures are called at most q? times. Since the simulator makes one query to the 
random permutation per execution of CompleteForward and CompleteBackward 
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Algorithm 1 Simulator 


1: variable: round function histories F),..., Fe 


2: procedure Query(i,U) 

3 if U ¢ F; then 

4 Fi(U) —r {0,1}" 

5: if i = 3 then 

6: for all Z € Fy do 

T: CompleteBackward(U, Z) 
8 if i = 4 then 

9 for all Y € F3 do 

0 CompleteForward(Y, U) 


11: return F;(U) 


12: procedure CompleteForward(Y,Z) 22: procedure CompleteBackward(Y,Z) 


13: X := Z@ F3(Y) 23: A:=Y @ F(Z) 

14: Query(2, X) 24: Query (5, A) 

15: R:=Y @ F(X) 25: S:=Z@® F5(A) 

16: Query(1, R) 26: Query(6, S) 

17: L:=X F(R) 27: T :=A® Fe(S) 

18: (S,T) := P(0, (L, R)) 28: (L, R) := P(1, (S,T)) 
(S,T) := R(0, (L, R)) (L, R) := R(1, (S,T)) 

19: A:=Y © F(Z) 29: X := Z@ F3(Y) 

20: F5(A)-—ZO@S 30: F(X) RƏY 

21: E -APT 1: E -LAX 


this in turns implies that the total number of queries to P is at most q?. Fi- 
nally, elements are added to the history of F\, Fə, Fs and Fẹ either when a 
query is made to the simulator, or during an execution of CompleteForward 
or CompleteBackward, so that the size of their history cannot be greater than 
+q. 


In order to prove that the two systems X = (P,SP) and X4 = (We, F) are 
indistinguishable, we will use two intermediate systems: X3 = (ws, SP) where 
the P-queries of D are answered by the Feistel construction asking round function 
values to the simulator, which itself interacts with P, and X3 = (ws", SR) 
where the random invertible permutation is replaced by a two-sided random 
function R (note the corresponding change in procedures CompleteForward and 
CompleteBackward indicated by a boxed statement). The four systems used in 
the proof are depicted in Figure P] 

The main part of the analysis is concerned with systems X and X3. We will 
show that unless some bad event happens, the round function values set by the 
simulator in Xə are consistent with P (which will enable to bound the statistical 
distance between X; and X2), and that in X3 they are uniformly random and 
independent (which will enable to bound the statistical distance between X3 and 
X4). In systems Xz and X3, the simulator first receives at most qf queries from 


296 A. Mandal, J. Patarin, and Y. Seurin 


X Xz D3 Xs 
P R 
A A 
P — S Ye — S Ye — S Ye | F 
a Ps a a E 
D D D D 
j 
0/1 0/1 0/1 0/1 


Fig. 2. Systems used in the seq-indifferentiability proof 


the distinguisher, and then at most 6qp queries from the Feistel construction 
(6 for each P-query of the distinguisher). Hence the total number of queries 
received by the simulator is exactly the total oracle queries cost of D, which is 
less than q. The statistical distance between answers of systems Xə and X3 is 
easily bounded. 


Lemma 2. For any distinguisher of total oracle queries cost at most q, the 
following holds: 
q 
|Pr [D> a") = 1] — Pr [D> a) = 1] < pa~ 
Proof. Consider the union of D, We, and S as a single distinguisher D’ interacting 
either with a random invertible permutation or a two-sided random function. 


Note that D’ makes at most q? queries to its oracle (Lemmaf[]). One can conclude 
thanks to the PRF/PRP switching lemma [J]. 


Before going further with the proof, we define formally what it means for an 
input x € {0,1} x {0,1}” to the Feistel construction to be computable with 
respect to the history of the simulator. 


Definition 4 (Computable input). Given a simulator history H and an input 
a € {0,1} x {0,1}?”, the sequence py(x) = (px (*)[#))ie[o..7) is defined as follows: 
— for a forward input x = (0,(L, R)), px(x)[0] = L, px(x)[1] = R, and for 
t= 2 toT: 
if pr(x)[t — 1] € Fi—ı then px(x)|t] = pr(x)[t — 2] 6 Fi_-1(px(2)[t — 1) 
else py(x) [i] =L 
— for a backward input x = (1,(S,T)), px(x)[7] = T, px(x)[6] = S, and for 
i= 5 to0: 


{ if pu(x)[é + 1] € Fi+ı then pr(x)[t] = pn(x)li + 2] © Fiza (on(x)[é + 1) 
else py(x)[t] =L 
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An input x is said to be computable with respect to H iff px(x){i] AL for all 
i € [0.7]. In that case we note W¥ (x) = (p(x)[6], pu (x)[7]) if x is a forward 
input and W(x) = (px(x)[0], pu (x)[1]) if x is a backward input. 


For a computable input x, we will often use the notation (L, R, X,Y, Z, A, S, 
T) = p(x) as depicted on Figure [] 

We now define a bad event that may occur during the execution of the sim- 
ulator (in X or X3) in relation with Lines 20 PII BO} and Bi] of the simulator. 
We will say that event Bad happens if in any execution of CompleteForward or 
CompleteBackward, the input value whose image is set at Lines [20] PJ BOlor BI 
is already in the history of the corresponding round function. This implies that 
the simulator overwrites a value so that its answers may not be coherent with P 
or R any more[ Reciprocally, if Bad does not happen, then the simulator never 
overwrites any value in its history. 

We start with the simple observation that if Bad does not happen, then during 
any execution of CompleteForward or CompleteBackward, the query to P or R 
made by the simulator is fresh. 


Lemma 3. In system Xə, if Bad does not happen, then in any execution of 
CompleteForward or CompleteBackward the query to P made by the simulator 
is not in the history of P. For X3, the corresponding statement holds for R. 


Proof. The reasoning is the same for Xs and X3, we use X% to fix ideas. Consider 
an execution of CompleteForward(Y, Z). Let x = (0, (L, R)) be the query to P 
made by the simulator, and (S,T) = P(x). If x is already in the history of P, 
it was necessarily added by a previous execution of CompleteForward(Y’, Z’) or 
CompleteBackward(Y’, Z’) (note that the distinguisher does not make any query 
to P in Xə or to Rin X3). But since Bad does not happen, round function values 
are never overwritten so that necessarily (Y’, Z’) = (Y, Z). This is impossible 
since by construction the simulator makes at most one call to CompleteForward 
or CompleteBackward per center (Y, Z) € F3 x F4. 


We are now ready to bound the probability that Bad happens in Xs or %3. 


Lemma 4. For any distinguisher of total oracle queries cost at most q, event 
Bad happens with probability less than 4q*/2” in X3 and less than 4q*/2” + 
gis in Do. 


Proof. See the full version of the paper [I9]. 


The following lemma says that as long as Bad does not happen in X2, the round 
function values set by the simulator are consistent with P. 


Lemma 5. IfBad does not happen in Xə, then for any input x € {0,1}x{0,1}?” 
computable with respect to the final history of the simulator H, W(x) = P(x). 


7 In previous work on indifferentiability of the Feistel construction {9[27], in such a 
case the simulator aborted. It does not change much since, as we will prove, this 
happens only with negligible probability. 
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Proof. Consider an input x € {0,1} x {0,1}?” computable with respect to the 
final history H of the simulator, and let (L, R, X,Y, Z, A, S,T) = p(x). There 
was necessarily a call to CompleteForward(Y, Z) or CompleteBackward(Y, Z) 
during the execution of the simulator. With respect to the history H’ just af- 
ter the completion of CompleteForward(Y, Z) or CompleteBackward(Y, Z), it 
is clear that wet (x) = P(x). Since Bad does not happen the simulator never 
overwrites a value and the equality remains true until the end of the simulation, 
hence W(x) = P(x). 


A direct consequence of this lemma is that as long as Bad does not happen in 
Xə, the answers of systems 7; and X% are identically distributed. 


Lemma 6. For any distinguisher of total oracle queries cost at most q, the 
following holds: 
4 4 
[Pr [D™(1") = 1] — Pr [D®2(1") = 1]| < i tam: 

Proof. Clearly, answers to F-queries of the distinguisher are identically dis- 
tributed in © and X% since they are answered by SP in both systems (may 
Bad occur or not) Ë] Moreover, in Xə any P-query x asked by the distinguisher 
is computable with respect to the history of the simulator at the time it is an- 
swered by W, and if Bad does not happen in Xs, then according to Lemma B] 
Wé'(x) = P(x) so that answers to P-queries of the distinguisher are also identi- 
cally distributed in both systems. The result follows from Lemma|[4] 


Lemma 7. /f Bad does not happen in system X3, then the round function values 
set by the simulator are uniformly random and independent. 


Proof. Since this is clear for round function values set uniformly at random 
(independently of Bad occurring or not), we only have to examine values that 
are adapted at Lines 20] PI B0] and BIJ of the simulator. But according to 
Lemma [3] if Bad does not happen, the query to R made by the distinguisher in 
any execution of CompleteForward or CompleteBackward is not in the history of 
R, so that the answer (S,T) or (L, R) is uniformly random. Consequently, round 
function values set by F5(A) — Z@S and Fe(S) — AGT in CompleteForward, 
or F(X) — RY and Fi(R) — LẸ X in CompleteBackward are uniformly 
random and independent of previous round function values set by the simulator. 
Since Bad does not happen round function values are not overwritten and the 
result follows. 


This lemma finally enables to bound the statistical distance between the answers 
of 2/3 and D4. 


Lemma 8. For any distinguisher of total oracle queries cost at most q, the 
following holds: 


|Pr Oa) = 1] — Pr [D*an = | <a 


8 It is crucial here that the distinguisher is sequential, otherwise the simulation in X2 
would be altered by the queries made by Ye. 
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Proof. If Bad does not occur in Xs then answers of SË are distributed exactly 
as answers of F according to Lemma [7] Hence the statistical distance between 
answers of X3 and X4 is upper bounded by the probability that Bad happens in 
X3, given by Lemma [4] 


Theorem 2]is now a simple consequence of Lemmata B] [6] and B] 


Remark 1. The strategy of using the intermediate system Xə is likely to be 
quite generic for seq-indifferentiability proofs (system X3, on the contrary, is 
quite specific to the Feistel construction). We believe this could probably make 
proofs of pub-indifferentiability (e.g. [12] Section 7]) much easier, but leave this 
for future work. 


Remark 2. Note that for general distinguishers (not necessarily sequential), the 
proof would go through exactly as above for Lemmata [2] and [8] The problem- 
atic step is clearly going from % to X2. To see what could go wrong if the 
distinguisher can interleave queries to P and S, consider the following simple 
example. D first makes a P-query P(0, (L, R)) = (S,T), and then makes the 
sequence of F-queries F\(R), F(X), Fe(S), F5(A). In system ©), the simulator 
returns uniformly answers to the four F-queries and will be unable to adapt 
F; and Fy, whereas in Xs the initial P-query of the distinguisher will trigger 
six F-queries from Wg which will lead the simulator to adapt the chain when 
query F4(Y) occurs. Making progress towards proving full indifferentiability for 
six rounds clearly requires to find the right way to deal with these “external” 
chains without knowing the P-queries of the distinguisher. 


4 Applications to Correlation Intractability 


Correlation intractability was introduced by Canetti et al. in their work on the 
limits of the random oracle methodology [4]. In the standard model, a function 
family is said to be correlation intractable if given the description of a random 
function f of the family, no PPT algorithm can find an input x, or more generally 
a sequence of inputs (a1,...,2%m), such that ((@1,...,%m),(f(a1),---, f(@m))) 
satisfies a relation that would be hard to satisfy for a uniformly random function. 

There is no difficulty in extending the definition of correlation intractability to 
an idealized model: instead of passing the description of the function as input to 
the algorithm, it is granted access to the ideal primitive used by the construction 
C. This way one can define a correlation intractable construction (accessing an 
ideal primitive). 

In all the following, we will consider relations over pairs of binary sequences 
(formally, a subset of {0,1}* x {0,1}*). We assume that the machine M re- 
turns sequences of strings in Domp, the domain of the ideal primitive Gn or the 
construction CF, 


Definition 5 (Evasive relation). Let G = (Gn) be an ideal primitive asso- 
ciated to G = (Domn, Rng,,,G,). A relation R over pairs of binary sequences is 
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said to be evasive with respect to G if for any PPT oracle machine M, there is 
a negligible function € such that the following holds: 


Pr [(a1,.--;2m) = MS (1”) : 
((£1;---, Em), (Gain Gn(Em))) E R] < e(n) . 


Definition 6 (Correlation intractable construction). Let C be a construc- 
tion with oracle access to an ideal primitive F = (F„) and implementing some 
primitive G. CF is said to be (multiple-output) correlation intractable if for any 
relation R over pairs of binary sequences evasive with respect to G, and any 
PPT oracle machine M, there is a negligible function e such that: 


Pr [(£1,..., Em) — M?” (1”) : 
(Eses im) C ti) C am) ER] < e(n) . 


Weak correlation intractability is defined similarly as above by quantifying only 
over all polynomial-time recognizable relations (i.e. relations R such that there 
exists a polynomial-time algorithm that, given ((x1,..., £m), (Y1; ---,Ym)), de- 
cides whether it belongs to R or not). 


Theorem 3. Let C be a construction with oracle access to an ideal primitive 
F =(F,) and implementing some primitive G. If CF is statistically (resp. com- 
putationally) seq-indifferentiable from the ideal primitive G, then CF is correla- 
tion intractable (resp. weakly correlation intractable). 


Proof. See the full version of the paper [I9]. 


A direct consequence of TheoremsP]and Blis that the 6-round Feistel construction 
with random round functions is correlation intractable: no polynomial algorithm 
with oracle access to the round functions can find a sequence of inputs that 
together with their image by the Feistel satisfy a relation that would be hard to 
satisfy in the random invertible permutation model. Note that the sole existence 
of correlation intractable invertible permutations in the random oracle model was 
already implied by the result of Holenstein et al. on the full indifferentiability 
of the 14-round Feistel construction (since full indifferentiability implies seq- 
indifferentiability and hence correlation intractability), but our results shows 
that six rounds are sufficient to achieve this property. 


Remark 3. According to Theorem [B] sequential indifferentiability implies corre- 
lation intractability. However correlation intractability does not necessarily im- 
ply sequential indifferentiability. In the full version of the paper [I9] we provide 
a simple counter-example separating the two notions. 


Implications for Chosen-Key and Known-Key Attacks on Block Ci- 
phers. Knudsen and Rijmen [I6] have introduced so-called known-key attacks 
on block ciphers. We discuss the implications of our results regarding this attack 
model in the full version of the paper [I9]. 
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Collisions Are Not Incidental: A Compression 
Function Exploiting Discrete Geometry 


Dimitar Jetchev!, Onur Ozen!, and Martijn Stam? 
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2 Department of Computer Science, University of Bristol, UK 


Abstract. We present a new construction of a compression function 
H: {0,1}8" > {0,1}°” that uses two parallel calls to an ideal primitive 
(an ideal blockcipher or a public random function) from 2n to n bits. 
This is similar to the well-known MDC-2 or the recently proposed MJH 
by Lee and Stam (CT-RSA’11). However, unlike these constructions, 
we show already in the compression function that an adversary limited 
(asymptotically in n) to O(2?""-°)/) queries (for any 6 > 0) has disap- 
pearing advantage to find collisions. A key component of our construction 
is the use of the Szemerédi—Trotter theorem over finite fields to bound 
the number of full compression function evaluations an adversary can 
make, in terms of the number of queries to the underlying primitives. 
Moveover, for the security proof we rely on a new abstraction that re- 
fines and strenghtens existing techniques. We believe that this framework 
elucidates existing proofs and we consider it of independent interest. 


1 Introduction 


Ever since the initial efforts to turn a blockcipher into a hash function, a major 
drawback of using blockcipher-based compression functions producing a digest 
size equal to the block-length is that the digest size is too small to produce a hash 
function meeting today’s security requirements. For example, AES, operating 
on 128 bits, limits collision resistance to at most 26t operations/queries. As a 
remedy, double-length compression functions and corresponding double-length 
hash functions have been introduced (e.g. [3}{5}): A design that outputs 2n bits 
(while making several calls to a blockcipher with n-bit blocks) could potentially 
provide collision resistance up to roughly 2” blockcipher evaluations. 

In this work, we are interested in the construction of a provably collision- 
resistant (beyond 2”/? queries) compression function from 3n to 2n bits making 
two parallel calls to an ideal primitive from 2n to n bits (either a public random 
function— PuRF-or an ideal blockcipher with n-bit blocks and n-bit keys). Our 
motivation is a natural one: All existing designs in this class fall short. There is no 
proof, or the proof is not known to extend to the blockcipher case; (non-trivial) 
collision resistance is only provided in the iteration; the primitive calls need to 
be made in sequence; or the number of calls is higher. Yet known impossibility 
bounds mi give no reason why such a construction should not be possible. 


R. Cramer (Ed.): TCC 2012, LNCS 7194, pp. 30 320, 2012. 
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Fig. 1. Our compression function H/1-!2: {0,1}8" —> {0,1}°” illustrated (see Con- 
struction [] for the details) 


Our Contribution. We provide a construction (see Fig. [I), for which we prove 
that any adversary limited (asymptotically in n) to O(2?"(-®/) queries (for 
any 6 > 0) has disappearing advantage to find collisions. To the best of our 
knowledge, this is the first design of its kind offering collision resistance beyond 
2”/? queries. Our construction has two key innovative components (see Fig. [I): 
a preprocessing function C™®® that transforms the 3n-bit input into a pair of 
2n-bit strings that are passed as inputs to the two ideal primitive calls; and a 
postprocessing function CP°ST that combines the two outputs of the ideal prim- 
itives and the 3n-bit input into the 2n-bit output of the compression function. 
Initially, we will concentrate on the PuRF scenario; details for the more compli- 
cated ideal-cipher model follow later (Section [6). In either case, we work in the 
ideal-primitive model (giving separate proofs for each scenario). 

A major technical hurdle in the proof of collision resistance is that the stan- 
dard proof techniques turn out to be insufficient. For concreteness, consider an 
adversary that adaptively makes three queries trying to create a collision. Cus- 
tomarily, one would upper bound the probability p; (for i = 1,2,3) that an 
adversary causes a collision on the ith query, say with B; = 1/4 each; taking a 
union bound leads to an overall bound 3/4. Our first abstraction is a game hop 
where we allow an adversary to choose its success probability p; directly, rather 
than computing it based on which query to some primitive is being made. By 
requiring p; < B; this leads to the same overall winning bound 3/4, achieved by 
a greedy adversary. However, this abstraction allows us to phrase and study dif- 
ferent scenarios as well (relevant for our collision resistance proof), for instance 
one where we only set a global requirement 5°; p; < 1/2. Now potentially each of 
the p; values could be 1/2 itself, so using 5); B; would lead to an overall bound 
of 3/2 (which is vacuous for a probability), yet intuitively no adversary should 
be able to do better than 1/2. A further complication arises when we require 
the adversary to obtain a success at least twice. While it is easy to deal with 
non-adaptive adversaries, properly taking care of adaptive adversaries is non- 
trivial. We provide the abstraction and solutions to the problems just described 
in SectionB] We believe this framework to be of independent theoretical interest. 

The main innovation of our design is the choice for C'®®: the 3n-bit input 
is transformed into a pair of an affine line on F3, and a point on that line. 
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Hence, any given valid input pair to the underlying ideal primitives corresponds 
to an incidence between a point and a line in the affine plane F3, over the finite 
field Fon. We then use a classical result of discrete geometry, the Szemerédi-— 
Trotter theorem over finite fields, to bound the number of incidences between a 
set of q lines and a set of q points on F3,, namely by roughly q?/?. 

The postprocessing is inspired by the Rogaway—Steinberger construction {10}, 
where a special type of Fə» linear map is used. However, we add the product 
of the two primitive-outputs to the inputs to this linear map. This turns out to 
be crucial for our collision resistance proof. In Section [5] we prove that the best 
strategy for any collision-finding adversary is (close to) maximizing the number 
of the aforementioned point-line incidences (in C’""). Our proof uses the newly 
developed techniques given in Section [B] to deal with adaptive adversaries. 

Putting the pieces together, we achieve the claimed collision resistance of 
already at the compression function level. We also prove (everywhere) preimage 
resistance up to O(2-°)") queries (for arbitrary ô > 0). From an efficiency 
perspective, our construction makes two parallel calls to distinct primitives, each 
with 2n-bit inputs. The overhead consists of a number of xors (to implement 
the matrix-multiplication) plus, more significantly, two full (Fx) finite field 
multiplications: one during the preprocessing and one during the postprocessing. 


2 Preliminaries 


Primitive-Based Compression Functions. A compression function is a map 
H: {0,1}*" — {0,1}5”, where n is an integer (the block-length, which in an 
asymptotic setting typically takes the role of the security parameter) and t > 
s > 0 are integer parameters. A compression function is primitive-based if it 
is computed by a program making calls to a finite number of specified oracles 
(primitives). We use superscripts to denote oracle access. For integers c and n, let 
Func(en, n) denote the set of all maps {0,1}°” — {0,1}” and let f © Func(cn,n) 
denote that f is sampled uniformly at random from all elements in Func(cn, n). 
Then we call f a public random function (PuRF) and we refer to a compression 
function making oracle calls to f as PuRF-based. For given input W we denote 
the resulting digest as H/(W). More generally, when there are r independently 
sampled primitives f,,..., fp < Func(cn,n) we write Hf! -f (W). 

Similarly, let Block((c—1)n,n) denote the set of all blockciphers having 
(c — 1)n-bit key and operating on n-bit blocks. In other words, Block((c — 1)n, n) 
is the set of all maps Æ: {0,1}-)" x {0,1}" — {0,1}”, such that for any 
key K € {0,1}(-0", E(K,-) is a permutation on the set {0,1}. (Note that 
(c—1)n+n = cn, so that one can interpret E € Func(cn,n) as well.) For 
a blockcipher E, we denote its inverse by D, so for all K € {0,1}—")" and 
X € {0,1}" we have that D(K, E(K,X)) = X. When E È Block((c— 1)n, n) 
is chosen uniformly at random we call it an ideal cipher and refer to a compres- 
sion function HË (or more generally, H®!*“" when there are r independently 
sampled blockciphers E1, ..., E,  Block((c — 1)n,n)) as blockcipher-based. The 
definitions and the illustrations below are provided in the PuRF-based setting; 
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the blockcipher-based case is analogous, where we assume that oracle access to 
E implicitly implies access to its inverse D as well. 

We study single-layer compression functions. This means that the oracle calls 
can be made in parallel and the output of the compression function is computed 
based on the results of these calls, as well as on the input itself. Formally, let 
CPE: 10,1)" — 10,1)" for i = 1,...,r, and CP" 10,1" x GO) = 
{0,1}*", be pre and postprocessing functions, respectively. Given a tn-bit in- 
put W, compute output Z = Hf=»fr(W) as follows: for i = 1,...,r, let 
xi 4 Cee*(W) and yi + fi(x;); the output is then Z 4 CPW, y1,..-, Yr). 


Security Notions. An adversary is an algorithm (typically modelled as an in- 
teractive Turing machine) that uses its oracle access to the underlying primitives 
of the compression function in order to ‘break’ some well-defined property. We 
will limit ourselves to (everywhere) preimage resistance and collision resistance, 
and consider information-theoretic adversaries only; our sole resource of interest 
is the number of queries made to their oracles (adversaries are considered com- 
putationally unbounded). Without loss of generality, adversaries are assumed 
not to repeat queries nor to query an oracle outside of its specified domain. 

When, for some l € {1,...,r}, an adversary makes an fi-query obtaining 
Y = fi(X), we will append (l, X,Y) to the query history Q (which is initialized 
empty). For preimage and collision resistance, adversarial success can be deter- 
mined based on the query history Q only, which we formalize using the yield 
set (Definition) and which we exploit by dropping the explicit sampling of the 
primitives f; and the queries Q for experiments. We partition Q in Q[1]... Q[r] 
depending on which of the primitives was called and, although technically ele- 
ments of Q are triples, we assume that the context suffices to determine which 
of the r primitives was used. For i < |Q|, we let Q; denote the first 7 elements 
of Q. Occasionally, we abuse notation by writing X € QO or Y €Q. 


Definition 1. Let Hf! be a primitive-based compression function and let 
Q be a set of queries (with answers) to the underlying primitives, then the yield 
set yieldset(Q) is the set of all pairs (W, Z) such that Z = Hf- (W) and all 
queries necessary for the evaluation of the compression function at W are in Q. 
We refer to the cardinality of yieldset(Q) as the yield and denote it by yield(Q). 
Additionally, we define yield(q) = maxo yield(Q) where |Q[#]| < q. (Note that 
since Q incorporates the primitives’ answers, the maximum implicitly includes 
a maximization over the choice of the underlying primitives.) 


Definition 2 (Collision resistance). Let H!»f7 be a primitive-based com- 
pression function. For a given Q and Z € {0,1}*", define 


coll(Q) = 3z,w#w (W, Z), (W’, Z) € yieldset(Q) . 


The collision-finding advantage of an adversary A is defined as 


Adv (A) = Pr fee: È Func(cn, n), Q 4 Af! + coll(Q) 
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Similarly, define Advi" (q) = maxa Adv" (A), where the maximum is taken over 


all adversaries A making at most q queries to each of the underlying primitives. 


Definition 3 (Everywhere preimage resistance). Let Hf be a 
primitive-based compression function. For a given Q and Z € {0,1}*", define 


eprez (Q) = dw’ (W’, Z) € yieldset(Q) . 
The everywhere preimage-finding advantage of an adversary A is defined as 


epre S $ fiofr . 
Adv ý (A) = M {Pr eee + Func(cn, n), Q A : eprez(Q)| } ; 


We also define Adv Ẹ" (q) = maxa Adv Ẹ" (A), where the mazimum is taken over 
all adversaries A making at most q queries to each of the r primitives. 


3 Probabilistic Analysis of Adaptive Adversaries 


Most of the security proofs in the literature for compression and hash functions 
rely on the same principle. Consider the game depicted in Fig. 2] where the ad- 
versary has access to some underlying primitive f() and tries to set a predicate 
E that is defined for all collections of query-response pairs. We are primarily in- 
terested in monotone predicates E, that once set cannot be ‘unset’ by additional 
queries. A predicate E is monotone if and only if for all Q C Q’ it holds that 
E(Q) > E(Q’). Additionally, we impose non-triviality of the predicate meaning 
that the predicate is not set from the outset (i.e. E(@) = false). For collision 
resistance, one should read coll (Definition P) for E and for preimage resistance 
eprez (Definition). Note that coll and epre, are always monotone and that, for 
our construction, both coll and eprez are non-trivial. 

Bounding an advantage is then tantamount to bounding Pr [E(Q)], where the 
probabilities are taken over the randomness of f and the coins of A, if any. In 
the following, we show how we can analyse such events in a stepwise approach 
to determine useful upper bounds. 

There is a distinction between adaptive and non-adaptive adversaries. The 
latter are required to commit to a fixed set of queries at the very beginning 
of the game. In the information-theoretic setting, it is customary (and WLOG) 
to consider deterministic adversaries only. Consequently, maximizing over all 
g-query (non-adaptive) adversaries becomes equivalent to maximizing over all 
possible query sets of cardinality q. This considerably simplifies proofs. For in- 
stance, when providing a proof in the ideal-cipher model (using a union bound), 
for a non-adaptive adversary every response can be considered fully random, 
whereas for an adaptive adversary previous queries to the cipher might influence 
the outcome slightly. 


Related work. Maurer i (see also Pietrzak ED developed a methodology to 
equate adaptive and non-adaptive adversaries in certain cases. While it is possi- 
ble to phrase our game from Fig. Jin their framework, for many of our winning 
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Exp™*4(A): Exp™ (A): Exp™F (A): 
Let i + 0, Qo + 0 (£1, -, £q) — AQ) Let i+ 0, Qo + 0 
While i < q do Let i + 0, Qo + 0 While i < q do 
aeitl While i < q do i} i+1 
zi + A(Qi-1) aet4+1 Ti + A(Qi-1) 
yi — f (xi) yi — f(x) yi — f(r) 
Qi + Qi-1 U {(xi, ys) } Qi + Qi-1U {(xi, ys) } Qi + Qi-1 U {(xi, yx) } 
Return E(Q,). Return E(Q,). Return E(Q,) A =F(Q,). 


Fig. 2. Standard adaptive (Exp®"**(A)) and non-adaptive (Exp’"*(A)) security games 
for predicate E, as well as the flagged experiment Exp™-F (A) 


predicates adaptive adversaries do have an advantage over non-adaptive adver- 
saries. Instead we opt for a more direct approach, where we primarily take our 
inspiration from existing hash-function security proofs. Henceforth, unless stated 
otherwise, we will consider adaptive adversaries only (and consequently drop the 
“ad” suffix in naming experiments and advantages). 


The Straightforward Approach. The standard way of dealing with adaptive 
adversaries, as exemplified for instance by the security proofs ries! for the 
PGV compression functions id, is the following. Suppose an adversary makes 
q queries. These are necessarily made in sequence, so denote with Q; the set 
of query-responses after į queries have been made (where i € {0,...,q}). The 
overall winning probability can then be stated as a sum of the probability of 
winning on the ith step, where these ‘stepwise’ probabilities are only taken over 
the choice of y;. This makes derivation of the overall bound relatively easy (even 
when taking into account the accompanying maximization). 


Proposition 1. Let E be a monotone non-trivial predicate. Then the advantage 
of any (adaptive) adversary A playing Exp®(A) (see Fig. Q) is bounded by 


Adv? (A) < 5 N ee max Pr [E(Q;) | Qi-1 A zi] . 
i=1 


i= 


Using an Auxiliary Flag. Although easy, the standard approach has the dis- 
advantage that for many more involved constructions, the maximum probalities 
can get too large. This is typically due to the maximum being attained only for 
relatively obscure values for Q;, values that themselves are extremely unlikely 
to occur. To weed out these unwanted cases, the analysis is often enhanced by 
splitting the monotone predicate into a set of auxiliary events. For some positive 
integer k, let E1,..., Ep be predicates such that (for all Q) E(Q) > y? E;(Q), 
then a union bound implies Pr [E] < eH Pr [E,]. Several examples of proofs using 
auxiliary events can be found in the realm of double-length hash functions |6, 14]. 

The events E;(Q) themselves are usually composed as the conjunction of a 
monotone event and a negated monotone event. In the simplest scenario, consider 
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Exp® (A): Exp’? (A): 
Let 1+ 0 Let i + 0 
While i < q do While i < q do 
ttt) ititl 
pi + A(X) Pi — AC) 
if 0 < p; < B; then if 0 < 1p; < Bx then 
with probability p; return true with probability p; return true 
Return false . Return false . 


Fig. 3. Game-playing interpretation of the adaptive security game (where B = 
(Bi,...Bq)) and our refined abstract flagging game. 


a second (non-trivial) monotone predicate F. If we define E; = E ^ -F and 
E2 = F then E > E; V Es is satisfied. To bound Pr [E2] = Pr [F] we can use 
Proposition [[} for Pr [E1] = Pr [E A ~F] Proposition 2]shows how the use of the 
predicate F effectively allows us to consider a more restricted class of Q;. 


Proposition 2. Let E be a non-trivial monotone predicate and let F be an arbi- 
trary auxiliary non-trivial monotone predicate. Then the advantage of (adaptive) 
adversary A setting E \ AF is bounded by 


q q 
Pr [E A =F] jso h i) | SE(Qi-1) A -F (Q:-1)] <) B: 
where Bi = MaxXQ;_1s.t.n5E(Q;-1)A7AF(Q:_1) Maxy, Pr [E(Q;) | Qi—ı AN zil. 


An Alternative Interpretation. We now make a far bigger step, removing 
most of the underlying mechanics of the original game. Instead of letting the 
adversary output elements x; and then determining by virtue of y; whether 
the adversary wins this round, we directly bound the latter probability. That 
is, in experiment Exp® (A) we let the adversary output a probabilities p; and 
imagine that E is set with probability p;. To avoid this game becoming vacuous 
(namely if the adversary would output some p; = 1) we put bounds B; and B; 
on the adversary’s success probability. These bounds correspond to the actual 
game: they are the highest possible success probabilities any adversary in any 
run can achieve in round i. These probabilities are reminiscent of the conditional 
probabilities used in the derivation from the standard approach and indeed we 
can formalize this relationship. Since in Exp® (A) a straightforward application 
of the union bound leads to an overall upper bound of the winning probability 
of X`}; Bi, we can recover Proposition P} 


Lemma 1. Consider games Exp®F and Exp? and subject to 


B; = ax maxPr[E(Q;)] . Then, 
Qi-1 s.t. -EO 1)AnF(Qi-1) Ti 


for all adversaries A, there exists an adversary A’ s.t. Adv™* (A) < Adv?” (A’). 
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3.1 A More Refined Approach 


In Exp” (A) instead of the guards 0 < p; < B; for all i, we could have used 
0< a pj S Xi B; as well. With a seemingly minor modification, this leads 


to another, different game where instead of bounding step-specific by De Bj, 
we always use the same bound By, as in the game Exp®* (A) (Fig. B). 


Proposition 3. For any adversary A, it holds that Adv”? (A) < By. 


Usage. The Exp”? game captures a special kind of condition that one can en- 
counter in the Exp” F game. For any given Q, we can a posteriori determine 
the probabilities of success by taking Q;-; and z; from Q and then looking 
at the probability that a freshly drawn y; causes a E. The overall a posteriori 
probability of some Q is the sum (over i) of these probabilities. The maximum 
attainable probability this way determines By, as formalized in Lemma ØP] Of 
note here is the observation that for certain games the By obtained here is much 
smaller than the Y B; one would obtain from application of Lemma [I] Very 
broadly speaking (and with some abuse of notation), it is the difference between 


maxo {3>, pi(Q)} and Z; maxo {p:(Q)}. 


Lemma 2. Consider the game Exp”, For any given Q, define 


pi(Q) = 0, if E(Q;-1) vV F(Q;-1) or KO <1 
í Pr [y; — f(x) : E(Qi-1 U {(z£:;,yi)}) | Qi-1] otherwise 
and let By = maxo X`; pi(Q). Then for all adversaries A there exists an 
adversary A’ such that Adv™F (A) < Adv®=(A’). 


3.2 Counting Successes 


In the previous games we considered a predicate E(Q) that could either be true 
or false . In other words, we were interested in at least one success occurring. In 
some scenarios, counting the number of succcesses is more appropriate. To this 
end, let ctr be a function such that ctr(Q) € IN and ctr(Q;) — ctr(Q;-1) € {0,1} 
for all possible Q;. For future reference, define the event hit(Q;) = true iff 
ctr(Q;) = ctr(Q;-1) + 1. In the new game Adv? (A), the predicate E(Q) is set 
if and only if ctr(Q,) > kK. 


k+1 
Proposition 4. For any non-adaptive adversary Adv’? (A) < (ga) (2) . 


Note that for k = 0 we retrieve the result of the preceding section given that 


(1) (2) i = By and it might be tempting to think that for larger « adaptivity 
can be argued away. This however is not the case, an adaptive adversary does 
have an increased advantage playing Adv8=-B’ when compared to a non-adaptive 
one. Nonetheless, we conjecture that the bound just derived is sufficiently loose 


to apply to adaptive adversaries as well. 
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k+1 
Conjecture 1. For any adaptive adversary Adv? (A) < Gip (2) ; 


Proposition 5. For any adaptive adversary Adv®* (A) < (By)"™}. 


4 A New Double-Length Compression Function 


In this section, we introduce a new compression function (Construction [I] see 
also Fig. D) H: {0,1}8" — {0,1}?”" that makes parallel calls to two random 
functions fi, fo: {0,1}2” > {0,1}". For notational convenience, we often write 
the input W € {0,1}8” as (a,b,c) € ({0,1}”)? and identify {0,1}” with Fan. 


Construction 1. Let fı, fo: {0,1}?” > {0,1}” be two distinct and indepen- 
dently sampled PuRFs. Define H}: {0, 138” > {0,1}?” to be a single-layer 
compression function using the preprocessing function C’®®: F3, — (IF3,)? de- 
fined by CP®® = (CTR®, CZ""), where 


Ci (a,b,c) = (a,b) and C3%™(a, b,c) = (c,ac + b) 


and the postprocessing function CPST: F3, — F3, 


POST T W11 W12 W13 W14 W15 
C (a, b, C, Yi, y2) =A. (a c yı Y2 yiy2) ’ where A = 
W21 W22 W23 W24 W25 


is a matrix (over Fən ) satisfying certain non-degeneracy conditions (see Table. 


Design Rationale. In the security proofs, we abstract as best as we can the 
properties required of C®™™® and C™°ST., In practice, we recommend using the 

: 11001 
matrix (cf. Table D) A = G 0110 
compression function, one needs to specify which input blocks represent the mes- 
sage block and which ones represent the state or chaining variable. Our security 
results are independent of this choice. The choice may, however, significantly 
affect the efficiency of the design. 


) Note that in the context of iterating the 


Incidence-Based Preprocessing. For a single-layer construction, the prepro- 
cessing function C?®® fully determines the relationship between the queries made 
to the primitive on one hand, and the compression function evaluations this en- 
ables on the other. Our search is therefore for a preprocessing function CT®® 
such that yield(q) does not grow too fast as a function of q. In particular, we are 
interested in whether we can find a C?® that has good behaviour for q < 2?” 
as well. It turns out, we can do well by exploiting results from incidence geome- 
try. We note the following theorem that is a finite field version of a theorem of 
Szemerédi and Trotter over the reals (see, e.g. for an elementary proof). 


Theorem 2. Let F be a finite field and P (resp. L) be a set of points (resp. lines) 
in F?. Let I(P, L) = {(p, £) | (p,0) € P x L and p € 4}. Then 


JI(P, E)| < min (PILI? + |L], ||P? + PI) - 
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Let (a,b), (c,d) € F3, denote the query pairs made to fı and f2, respectively. 
We call a query pair (a,b)-(c,d) compatible if and only if ((a,0),(c,d)) is in 
the image of C™®. In addition, a query (a,b) is called (c,d)-compatible or vice 
versa if the pair (a, b)-(c, d) is compatible. For the preprocessing function C?®® 
from Construction[]] a pair (a, b)—(c, d) is compatible if and only if d = ac +b is 
satisfied. Finally, a preprocessing function C?™® satisfies the completion property 
if and only if (i) (a,b) and c (ii) (c,d) and a uniquely determines a compatible 
query pair (a, b)—(c, d) for any a,b,c,d € Fon. 


Proposition 6. The preprocessing function C™® from Construction O] has the 
completion property and yield(q) < q3/? +q. 


Proof. We remark that the completion property can be algebraically verified. 
To determine the yield, we interpret the (a,b) as the line y = ax + b in F3, 
and (c,d) as a point in F3,. This renders bounding the yield an immediate 
consequence of Theorem [2] To finish the proof, note that the sets Q[1] and 
Q[2] correspond to the lines L and the points P, respectively, and |I(Q[2], Q[1]) 
counts exactly the number of compression function inputs whose mapping can 
be completely determined by the given queries. Specifying |Q[1]| = |Q[2]| = q 
yields the proposition statement. 


Non-linear Matrix-Style Postprocessing. Our postprocessing is clearly in- 
spired by the use of F2n-matrices by Rogaway and Steinberger [o], but with the 
crucial difference that we add the non-linear term y1y2. Omitting this non-linear 
term is fatal for security. For the fully-linear version an adaptive adversary can 
force its evaluated digest to lie (uniformly) on a prespecified set of size 2”. In 
contrast, for our construction, the adversary’s control is significantly reduced. 


Security Claims. We state our security claims for collision and (everywhere) 
preimage resistance in Theorems [3] and [4] respectively. A sketch of our collision 
resistance proof is given in Section 5] We refer to the full version for proofs of 
Theorem [4] Corollaries [] and B} 


Theorem 3. Let Hf? be a single-layer compression function defined by CP°ST 
given in Construction O] where C?®®: F3, > (F3,)? is any function that satisfies 
the completion property. Let k, u, y > 0 and A > 3 be integers and let k = kA+p. 


Then 
2 q k+1 q 
KY OD, (9) +2" (2) P (a 


Qn Qn-1 2(y—1)n—1 Qn Q(u-1)n-1 


(3) 


coll 
Adviz (q) < JOa—2n-1 * 


+ 


Corollary 1. Let Hf? be the compression function given in Construction O} 
For every ô > 0 and q = 22"(-9/3, one has Adv" (q) = 0(1) as n > co. 


Theorem 4. Let H/-f2 be a single-layer compression function defined by CP°S™ 
given in Construction[]] and an arbitrary C®™! that satisfies the completion prop- 
erty. For any integer k > 1, one has 


spe ni {4 1 \° q kq 
Advi (q) < 2 a ) (=) + gQn—-1 aa Qn—-1 E 


K 
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Corollary 2. Let Hf? be the compression function given in Construction D} 
Then for all 6 > 0 and q=2""-°), it holds that Adv‘'*(q) = 0(1) as n > oo. 


5 Proof of Collision Resistance (Theorem 


5.1 Overall Strategy 


Let A be a collision-finding adversary making at most q queries to each of the 
public random functions fı and fə (without loss of generality, we assume that 
the adversary makes exactly q queries to both). Our goal is to bound Adv$o" (A), 
in particular Pr{coll(Q)], where Q is adaptively generated by A. We slightly 
abuse notation and use Q (and derived symbols such as Q;) interchangeably as 
a random variable (when it is the direct result of playing the collision game), or as 
a dummy variable (e.g. when we want to quantify over all possible instantiations), 
where the context makes the precise meaning clear. In all cases we can use the 
global parameter q for the number of fı and fə queries and Y = yield(q). 

To bound the probability of an adversary finding a collision, we first look at 
the probability that any specific query completes the collision: fix i and consider 
the event coll(Q;) A acoll(Q;_1). Here we call query i fresh and we say it causes 
a collision. For concreteness, suppose the ith query is an f1-query (a,b) (the case 
for an f2-query (c,d) is analogous), then the first observation is that it adds a 
new point to the yield set for every (a, b)-compatible pair (c,d) that was already 
in Q;_-1. Now the ith query can cause a collision in two different ways: 


Case I. Two compatible and colliding pairs (a, b)—(c,d) and (a’,b’)-(c’,d’) are 
formed with the triple {(a’, 0’), (c, d), (c, d’)} C Q;-1 (where (a,b) Æ (a’, b’)). 

Case II. Two distinct compatible and colliding pairs exist with (a,b) = (a’, b’) 
and {(c, d), (c’,d’)} C Q;-1, where (c,d) # (c,d’). 


We associate the events coll;(Q) and coll77(Q) with these two cases; it follows 
that coll(Q) = (coll (Q) V collzr(Q)). The probability of finding a collision at the 
ith step depends strongly on the number of compatible queries already in Q;_; 
we denote this number by (random variable) n;. While we know (by design) 
that ey ni < yield(q) , a straightforward union bound fails to take this into 
account properly: Because potentially n; ~ q, naive bounding of wee ni would 
be quadratic in q (which is typically much larger than yield(q)). Dealing with this 
in case of non-adaptive adversaries is straightforward (as such an adversary needs 
to commit to the n; values in advance), but requires a more careful treatment 
in the case of adaptive adversaries. To bound the probability of coll;(Q), we 
additionally condition on not having too many collinear output points. For an 
integer «x > 0, badak] (Q) is set if and only if Q leads to more than « collinear 
output points in F2,,. The reason for collinearity will become evident shortly. 


An Overview of the Proof. We start with the observation, for any Q, that 


coll(Q) = (coll (Q) V collrr(Q)) = (coll (Q) A =collr7(Q)) V collzz (Q) , 
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where the expression (coll7(Q) A acollz7(Q)) is equivalent to 


(coll (Q) \ —collz; (Q) TAN abadaxj(Q)) V (coll;(Q) TAN acollr7(Q) TAN badaj (Q)) i 
Using the trivial implications for the above statements, we reach 


coll(Q) > (collr(Q) A ~badaf«](2)) V (=collrr(Q) A badat; (2)) V collr7(Q) : (1) 
a a a — 


Ey E2 E3 


The idea of our proof is to find separate upper bounds for the probability of the 
events E; for 7 = 1,2,3 and then use the union bound the finalize the proof in 
CorollaryB] (i.e. S Pr[E;] provides the overall upper bound). An upper bound 
for Pr[E1] is given in Lemma[7] (corresponding to the term xY /2” in Theorem). 
An upper bound for Pr[Es] is established in Lemma[8| which corresponds to the 
term qy?/2"—! + q7/20-"—! from Theorem B] Finally, we explain where the 
bounds for Pr[Eg] (i.e. the remaining terms from Theorem B) come from. We 
use Proposition [9] to establish an implication that leads to an upper bound for 
Pr[E2]. Moreover, several auxiliary events, which are defined and investigated 
in Sections [5.2] and [5-4] are required to finalize the bound Pr[E2]: The upper 
bound for the auxiliary events are given in Lemmas [3] [4] 5] and [9] 


On the Matrix A Used in C*°*. In the following, we consider a general 
matrix A (see Construction [I over Fə» for the proof of Theorem [B] The condi- 
tions on the entries of the matrix A required throughout the paper, as well as 
where they are used, are provided in Table [I] (Note that the probability that a 
randomly selected matrix A satisfies our criterion is close to one.) 


Output Lines. By assumption, we know that an fı-query (a,b) can only com- 
plete a collision using an already present compatible f2-query (c,d). Let (a,b) 
be an fi-query and let (c,d) be a preceding (a,b)-compatible fo-query with 
y2 = fo(c,d). The output CP (a,b,c, y1, y2) of the compression function on 
input (a,b,c) then lies on the line (in F3,. ) 


À AW 11 + CW12 + Y2wW14 w13 + Y2W15 
Liedyya: +Y1 |y € Fon > , (2) 
aW21 + CW22 + Y2W24 w23 + Y2Wa5 
S o 
offset slope 


where we get the actual output point for (a,b,c) by setting yı = fi(a,b). The 
randomness of fı results in a random point on L1:c,d,y2;a- Note that the line 
cannot be degenerate (see condition C1 in Table[I), i.e. it has nonzero slope. 
Similarly, let (c, d) be an fo-query and let (a,b) be a preceding (c, d)-compatible 
fi-query. The output of the compression function on (a, b,c) lies on the line: 


AW 11 + CW12 + YW wig + YW 
Lo-a,byric? {( n ease A +0 ( ae 5) lin € Fa} . (3) 


aw21 + CWw22 + Y1W23 w24 + Y1Was 


This time the output point is obtained by setting y2 = fo(c,d). Again, the 
randomness of fz results in a random point on Lo:a,b,y;;c. We note that this time 
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Table 1. Quick recap of the properties of the entries of A (see Construction [I) used 
in the proof of Theorem [3] (N) denotes that the condition is necessary, whereas (S) 
denotes it is sufficient. 


The Condition Where used 
TS 


C3: Lemma 
C4: Lemma [B] 
Construction [L 


non-degeneracy follows from w14w25 — w24w15 Æ 0 (see condition C2 in Table D}. 
Now it is easy to see why we do not want too many collinear points: It would 
ease the collision-finding considerably due to the above output lines. 


5.2 Partitions, Bunches and Some Auxiliary Events 


Partitions and Bunches. Suppose that an f2-query (c,d) results in y2 = 
fo(c,d). By the completion property, we obtain, for each a € Fan, a unique b 
such that (a,b) is (c,d)-compatible. Now we recall that if we query f1 (a,b), the 
resulting yield point lies on the line £1.¢,4,y9:;a- From Equation of Li:c,d,yz;a; it 
follows that the slope of these lines is fixed (because (c, d) and y2 are fixed) and 
independent of a; hence by ranging over all possible a € Fn we achieve a set of 
(parallel) lines. This is what we call a partition (partitions due to an f1-query is 
defined analogously): Piic,djyo = {£1:c,d,yo;a | @ € Fon} . The opposite notion to a 
partition is a bunch: For all preceding and (a, b)-compatible (cj, dj) E€ Q, for some 
integer j > 1, the bunch of interest is the collection of lines (for y2:; = f2(c;, d;)) 


By :(a,b)(Q) = ee Hela touse | (cj, dj, 2:7) = Q ^ (cj, d;) compatible with (a, b)} f 


(We also write B1.: if the ith query is an fı-query (a, b).) The answer yı = fı (a,b) 
specifies a point on each of these lines to be added to the yield set; we refer to 
this as realizing the bunch. For the record, B2.(¢,4)(Q) is defined analogously. 


Degenerate Partitions. We have seen that a partition contains a set of parallel 
lines. If different choices of a lead to different lines, the lines compatible to 
(c,d) necessarily partition the output plane (justifying our terminology). It is 
possible however that regardless of the a values, we end up with identical lines 
(though with a different parametrization). In such a case, a partition collapses 
to a single line and we speak of a degenerate partition. A degenerate partition 
causes problems in our proof, because it allows an adversary to create many 
collinear points (by ranging over a). Let badap(Q) denote the event that Q gives 
rise to a degenerate partition (either via different a or c values). 
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Lemma 3. Let Q be generated adaptively, then Pr [badap(Q)] < q/2"~! , and if 
W11W23 £ W13W21,W12W24 £ W14W22;, W11W25 = W15W21 ANd W12W25 = W15W22 , then 


Pr [bada (Q)] = 0 . 


Parallel Partitions. We now define another bad event, parallel partitions, that 
can potentially help a collision-finding adversary create collinear points. We have 
seen that, once answered, a single f2-query (c, d) determines a well-defined slope 
for the partition Pi:c,a,y,. If two or more distinct partitions (of the same type) 
have the same slope, we call the partitions parallel. The number of parallel 
partitions is tightly related to a standard occupancy problem. Consequently, 
avoiding parallel partitions altogether is not realistic, yet we can put reasonable 
bounds on too much parallelism occurring. We define badppi,,;(Q) to be the event 
that Q results in more than y parallel partitions (of identical type). 


Lemma 4. Let Q be generated adaptively. Then, for any integer u > 0, 


G 
Pr [badpp{yj(Q)] < crea 


Local Collinearity. Now we discuss another auxiliary event, local collinearity, 
that is used in our collinearity analysis. Suppose an fı-query results in yı = 
fi(a,b). We associate with this query-response pair a point (a,y1) € F3,. Let 
badi-j,j(Q) be the event that there exist at least \ pairs of fi-queries (aj, b;) 
with distinct a; values, such that the associated points (a;, yi.;) are collinear or, 
alternatively, that there exist at least À pairs of f2-queries (c;,d;) with distinct 
ci values, such that the points (c;, y2:i) are collinear. 


Lemma 5. Let Q be generated adaptively. Then, for any integer \ > 0, 


q 
Pr [badic (2)] < s= ; 


Target Local Collinearity. For local collinearity, we are interested in any A 
associated points being collinear, without worrying about which line they are 
on. However, in an upcoming case we are only interested in points all lying on 
a line with a pre-specified slope (the offset of the line is not fixed in advance). 
Let bad,\;,;(Q) be the event that Q[1] or Q[2] leads to more than y associated 
points collinear with pre-specified, non-vertical slope. 


Lemma 6. Let Q be generated adaptively. Then, for any integer y > 0, 


a 
Pr [badgicjy)(Q)] < 20- 1)n=1 i 


5.3 Bounding Collisions: Focusing on Pr[E;] and Pr[Es] 


Lemmas [7] and [B] provide an upper bound for Pr[E1] and Pr[Es3], respectively. 
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Lemma 7. Let i be a positive integer that satisfies i < q and Let Qi—ı be arbi- 
trary query list satisfying ~badgj,.j(Qi-1) (for some positive integer K). Then 


Pr[E,] = Pr{coll7(Q) A sbadey,j(Q)] < = . 


Proof (Sketch). We start by noticing that Pr[E1] < Pr{colly(Q)|-bad4j,.j(Q)]. 
Each of the n; compatible elements together with the ith query, defines a line 
such that the random answer to the ith query will determine which point will 
be added to the yield set. The condition sbadgj,,j(Q;-1) implies that on each 
of these lines, there are at most « previous yield points. Since the underlying 
primitive is a random function, the answer is fully random and for a given line, 
one of the previous yield points is hit with probability at most «/2". A union 
bound over the n; lines gives the bound n;x/2”. To obtain the overall bound, 
we exploit our refined game Adv’? (A) to determine the V expression. Here we 
use the above to determine By = KY/2” as DEA ni < Y (Proposition [B}). 


We now bound the probability of finding an instantaneous collision with a fresh 
query, first given that —bad,;,](Qi-1) holds. Then we finalize our bound for 
Pr [coll;;(Q)] using Proposition 2] along with Lemma [6] 


Lemma 8. Let i be a positive integer that satisfies 1 < q and let Q be generated 
adaptively. Then, for any integer y > 0, 


q7? 


2n—1 


Pr[Es] = Pr [collzz(2)] < 


+ Pr [badgicij (Q)] $ 


5.4 Bounding Overall Collinearity: Bounding Pr[E2] 


We now bound Pr[E2]. The main technical difficulty is to properly separate the 
randomness of the fı- and f2-queries. In order to do this in the adaptive setting, 
we use a method that we call bunching. For a fixed i, suppose that the ith query 
is an fi-query (a,b). Recall that for the f\-query (a,b), the bunch 6.; consists 
of the lines £Ly.¢,,d;,y.;;a for the n; compatible preceding f2-queries (cj, dj) (with 
yo: = fo(c;,d;) for j = 1,..., ni). The answer yı = fı(a,b) adds a single point 
to the yield set for each compatible f2-query (c;,d;). These n; new points lie on 
the lines L1:¢;,4;,y2,;;a, thereby realizing the bunch B,.;. We refer to the set of 
freshly added points inside a bunch as a constellation that we denote by 


C1.i(Q) = {Hf (a,b, cj) | (cj, dj) E€ Oi-1 A (cj, dj) compatible with (a, b)} ; 


In order to determine the maximum collinearity within the yield set, we estimate 
(i) the probability of too much collinearity occurring within a single constellation 
(Proposition[8) and (ii) the probability of too many constellations being collinear 
(Lemma). Here, a set of constellations is collinear if we can choose a point from 
each constellation in the set such that all chosen yield points are collinear. If we 
know that at most A points are collinear within a single constellation, and at 
most k constellations are collinear, we can conclude that at most k = kA points 
are collinear overall. This is formalized in the proposition below, taking into 
account an additional technicality. 
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Proposition 7. Let k, A, u > 0 be fixed integers, k = kA+ p and let badimjy (Q) 
be the event that there exists a constellation having more than À collinear points. 
Define badei] (Q) to be the event that there exists a line £ passing through more 
than k constellations whose bunches do not contain £. Then (for arbitrary Q) 


bada..j(Q) > (badine{a) (Q) vV badext[k] (Q) v bad, pt, (Q)) : 
Proposition 8]is used to decompose the event badintja] (Q) into two events. 


Proposition 8. For arbitrary Q, if (integer) A > 3 then 
(=coll77(Q) A badina (2)) => (badap(Q) V badic (2)) X 


To bound collinearity between constellations, we first consider collinearity with 
a given line @ in the output plane. We are interested in bounding the probability 
that at least k constellations are incident to £. For a line Z, integer k and query 
history Q, let bady_pit,j(Q) be the following event: there exist at least k con- 
stellations whose corresponding bunches do not contain £ that are incident to £. 
Recall that badextjxj(Q) is the event that there exists a line ¢ passing through 
more than k constellations whose bunches do not contain £. 


Lemma 9. Let £ be given and let Q be generated adaptively. Then 


k+1 y \Ft 

Pr [badeni (2)] < (=) and Pr [badextiij(Q)] < 27” (=) : 
Proof (Sketch). Let ctre_nit(Q) be the number of constellations that are incident 
to £, again restricted to those constellations whose corresponding bunch does not 
contain l. Clearly, the event bad,_pit{zj(Q) is equivalent to ctre_pir(Q) = k. Note 
that for any i, we have ctre_pit(Qi) — ctre_nie(Qi_1) € {0, 1} since constellation i 
can be counted at most once (namely if it is incident to £). Let hite—nit(i) be the 
event that the bunch 6; upon realization is incident to @. Suppose that £ ¢ B; and 
that B; consists of n; lines (each containing an output point). Since £ intersects 
each line in a bunch in at most one point, we obtain that Pr [hite_pie(é)] < n;/2”. 
Due to yield restrictions, pu ni < Y. The lemma statement follows from 
applying Proposition [4] with Bs = Y/2". The statement for Pr [badet] (Q)] 
follows from the union bound over all lines £. 


Proposition 9. Let k, A, and u be positive integers with A > 3 and k = kà + p. 
Then, for arbitrary Q, 


Pr[-colly7(Q) A badak] (Q)] < Pr[F(Q)] , where 
Pr [F(Q)] < Pr[badejxj(Q)] + Pr[badppi,.)(Q)] + Pr[badi-{xj(Q)] + Pr[badap(Q)] . 


Finishing the Proof. The following corollary wraps up what we have discussed 
so far and finishes the proof of TheoremB]with the help of earlier obtained bounds 
(Lemmas and D). 


Corollary 3. Let Q be generated adaptively, then for F(Q) given in Prop. [9 
Pr [coll(Q)] < Pr[collr (Q) A sbad.,,j(Q)] + Pr [coll77(Q)] + Pr[F(Q)] . 
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6 Blockcipher-Based Instantiation 


A naive replacement of the underlying PuRFs in Construction] with ideal block- 
ciphers leads to a weaker security due to the availability of the decryption queries 
(see the full version for the justification). However, adding a layer of “Davies— 
Meyer” suffices for our purposes. Note that there is no need to change CP"; the 
only modification is in CP°ST (the proofs are given in the full version). 


Construction 5. Let Fy, E2: {0,1}"x{0,1}" — {0, 1}” be two fixed randomly 
(and independently) chosen blockciphers. Define a single-layer compression func- 
tion H¥1-"2; {0,1}8" > {0,1}? by CP®®: F3, — (F3,)? from Construction [] 
and CPST: F3, 3 Fn 


CPT (a,b,c, y1, y2) = A: (a,c,a + y1, € + y2, (a+ y1)(C+y2))’ , where 
A is a matrix satisfying certain non-degeneracy conditions[] 


Theorem 6. Let H®1"2 be given as in Construction [B] where C™®: F3, —> 
(F2)? is an arbitrary function that satisfies the completion property. Let k, u, y > 


0 and À > 3 be integers. Then, for k =kA+4+ p, Advso"'(q) is upper bounded by 


m) 4(3) 


2" —q n — q) E (27 — g) ` 


q 


KY +27? +44 | 4(5) 
2” — q (27 — g)O-9 


Theorem 7. Let H”!:¥2 be given as in Construction [B] where C?®®: F3, > 
(F2,.)? is an arbitrary function that satisfies the completion property and let 
k > 1 be an integer. Then 


2 \ 9 2 
aago <22 (4) ( ) + E 


K 2” —q 27-—-qg Poq 
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Abstract. This paper is about private data analysis, in which a trusted 
curator holding a confidential database responds to real vector-valued 
queries. A common approach to ensuring privacy for the database ele- 
ments is to add appropriately generated random noise to the answers, 
releasing only these noisy responses. A line of study initiated in [7] ex- 
amines the amount of distortion needed to prevent privacy violations of 
various kinds. The results in the literature vary according to several pa- 
rameters, including the size of the database, the size of the universe from 
which data elements are drawn, the “amount” of privacy desired, and for 
the purposes of the current work, the arity of the query. In this paper 
we sharpen and unify these bounds. Our foremost result combines the 
techniques of Hardt and Talwar and McGregor et al. to obtain 
linear lower bounds on distortion when providing differential privacy for 
a (contrived) class of low-sensitivity queries. (A query has low sensitivity 
if the data of a single individual has small effect on the answer.) Several 
structural results follow as immediate corollaries: 


— We separate so-called counting queries from arbitrary low-sensitivity 
queries, proving the latter requires more noise, or distortion, than 
does the former; 

— We separate (¢,0)-differential privacy from its well-studied relax- 
ation (e, ô)-differential privacy, even when 6 € 27°” is negligible in 
the size n of the database, proving the latter requires less distortion 
than the former; 

— We demonstrate that (e, ô)-differential privacy is much weaker than 
(e, 0)-differential privacy in terms of mutual information of the tran- 
script of the mechanism with the database, even when 6 € 27°™ is 
negligible in the size n of the database. 


We also simplify the lower bounds on noise for counting queries in 
and also make them unconditional. Further, we use a characterization 
of (e,6) differential privacy from to obtain lower bounds on the 
distortion needed to ensure (¢,6)-differential privacy for e, > 0. We 
next revisit the LP decoding argument of [IO] and combine it with a 
recent result of Rudelson to improve on a result of Kasiviswanathan 
et al. on noise lower bounds for privately releasing ¢-way marginals. 


Keywords: Differential privacy, LP decoding. 
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1 Introduction 


This is a paper about private data analysis, in which a trusted curator holding 
a confidential database responds to real vector-valued queries. Specifically, we 
focus on the practice of ensuring privacy for the database elements by adding 
appropriately generated random noise to the answers, releasing only these noisy 
responses. A line of study initiated by Dinur and Nissim examines the amount 
of distortion needed to prevent privacy violations of various kinds [7]. Dinur and 
Nissim did not have a definition of privacy; rather, they had a notion that has 
come to be called blatant non-privacy; the modest goal, then, was to add enough 
distortion to avert blatant non-privacy. Since that time, the community has 
raised the bar by definining (and achieving) powerful and comprehensive notions 
of privacy [7[9]8], and the goal has been to preserve (£, 0)-differential privacy and 
its relaxation, (£, 6)-differential privacy. A final goal considered herein, attribute 
privacy, has a more complicated description, but may be thought of as preventing 
blatant non-privacy for a single data attribute in the presence of a certain 
kind of contingency table query. 

The results in the literature vary according to several parameters, including 
the number n of elements in the database, the size d of the universe from which 
data elements are drawn, the “amount” and type of privacy desired, and for 
the purposes of the current work, the arity k of the query. In this paper we 
strengthen and unify these bounds. 

As corollaries of our work, we obtain several “structural” results regarding 
different types of privacy guarantees: 


— We separate so-called counting queries from arbitrary low-sensitivity queries, 
proving the latter requires more noise, or distortion, than does the former; 

— We separate (e, 0)-differential privacy from its well-studied relaxation (£, 6)- 
differential privacy, even when 5 € 27°(”) is negligible in the size n of the 
database, proving the latter requires less distortion than the former; 

— We demonstrate that (¢,6)-differential privacy is much weaker than (e, 0)- 
differential privacy in terms of mutual information of the transcript of the 
mechanism with the database even when 6 € 2~°(”) is negligible in the size 
n of the database. 


We also simplify the lower bounds on noise for counting queries in [IT] and also 
make them unconditional removing a technical assumption on the mechanism 
present in their paper. Next, we use a characterization of (e, ô) differential pri- 
vacy from to obtain lower bounds on the distortion needed to ensure (e, 6)- 
differential privacy for e, > 0. We remark that [I2] also obtain quantitatively 
similar lower bounds on the distortion required to maintain (e, ô) differential 
privacy for the class of ¢-way marginals though their proof technique is very 
different and arguably much more complicated. 

After this, we use results of Rudelson and combine it with LP decoding 
to show that attribute privacy is violated if (-way marginals are released with 
at least 1 — 1 fraction of these marginals are released with o(./n) noise for some 
n > 0. The results and the technique in [2] required 7 = 0 making our results 
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more powerful. Finally, we extend the results of [7] to the case of small universe 
size achieving stronger lower bounds to prevent blatant non-privacy. 

To describe our results even at a high level we must outline the privacy- 
preserving database model, the notion of distortion or noise that may be em- 
ployed in order to preserve privacy, and the meaning of the goals of the adversary: 
blatant non-privacy, violation of (e, 0)-differential privacy, violation of (e, ô)- dif- 
ferential privacy, and attribute non-privacy. 

Typically, the curator of a database receives questions to which it responds 
with potentially noisy answers. There are two possible settings here. One is that 
the queries are received by the curator one at a time. The other situation is that 
all the queries are received by the curator at once and it then publishes (noisy) 
answers to all of them at once. The former is called the interactive setting and 
the latter is called the non-interactive setting. All our lower bounds are in the 
non-interactive setting making them applicable to the interactive setting as well. 

We now formally describe a database and a query : A database X is an 
element of (Z+)@ . Here d is called the universe size and intuitively refers to the 
number of types of elements present in the database. Also, for a database X, 
n= D X; is defined as the size of the database and refers to the number of 
elements in the database. Note that we are representing databases as histograms. 
A query (of arity k) is a map F : (Z*)4 > R¥* such that Vi € [k], Vz, y € (Z*)4, 
|F(x+y)i— F(x)i| < 1 if ||y|]4 = 1. In other words, every coordinate of the map 
F is 1-Lipschitz. We say F is a counting query if F is a linear map. The meaning 
of d,k,n throughout the paper shall be the same as above unless mentioned 
otherwise. 

We now formally introduce the definition of mechanism and privacy. 


Definition 1. Let F be a family of queries such that VF € F, F : (Z+)? > RY. 
Then, a mechanism M : (Z+)? x F — p(R*) where p(R") is simply the set of 
probability distributions over R}. On being given a query F € F and a database 
x € (Z+)4, the curator samples z from the probability distribution M(x, F) and 
returns z. 


We next state the definition of ¢-differential privacy (introduced by Dwork et al. 
in [9]) and (e, 6)-differential privacy (introduced by Dwork et al. in [8}). 


Definition 2. For a family of queries F, a mechanism M : (Zt+)4 x F > 
u(R*) is said to be e-differentially private if for every x,y € (Z+)? such that 
\|z — ylla < 1, every measurable set S C R! and VF € F, the following holds : 
Let M(x, F) = Mz,- and M(y, F) = Myr and for a probability distribution I, 
let T(S) denote the probability of set S under I. Then, 


Mz,r(S) 
E y .F(S) 


The mechanism is said to be (e€,6)-differentially private if 


a 


275. My r(S) -ô< Mz,F(S) = Oe - My F (S) +ô 


Typically, 6 is set to be negligible in n, k. 
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We remark that we do not define the notion of noise very precisely here as the 
notion of noise depends on the context. However, in the context of differential 
privacy, we use the following definition of noise. 


Definition 3. For a family of queries F, a mechanism M : (Z+)? x F > p(R*) 
is said to add noise (at most) ņ if with high probability (say 0.99) over the 
randomness of M, ||M(«,F) — F(2)|loo < n. 


While differential privacy is a very strong notion of privacy, sometimes one can 
show that even very modest definitions of privacy get violated. One such notion 
is that of blatant non-privacy. We say that a mechanism M for answering F 
over databases of size n and universe size d is blatantly non-private, if there is 
an attack A such that w.h.p. over the answer y returned by the mechanism M, 
A(y) differs from the database only at o(1) fraction of the places. Yet another 
very weak notion of privacy that is interesting to us is that of attribute non- 
privacy. The formal definition follows : 


Definition 4. For a query F € F, a mechanism M : ({0,1}¢)" x F + R* is 
said to be attribute non-private if there exists Y € ({0,1}4-1)” and an algorithm 
A such that for every x € {0,1}", 


Pr A(z) =x" : || — a" ||, = o(|\a]1)] > 1/10 
semble AW = 2! le- h = oleh] 2 1/ 
where Y ox simply denotes the obvious concatenation of Y and x. A need not be 
computationally efficient and the constant 1/10 is arbitrary and can be replaced 
by any positive constant. 


We show the following results : 


1. Combining techniques from and [13], we obtain tight lower bounds on 
the noise for arbitrary (non-counting) low-sensitivity queries for any (e, 0)- 
differentially private mechanism. Given positive results of Blum, Ligett, and 
Roth [3], this separates non-counting queries from counting queries, prov- 
ing that the former require more distortion than the latter for maintain- 
ing differential privacy. Also, given the positive results of for arbitrary 
low-sensitivity queries, this separates (£, ô)-differential privacy from (e, 0)- 
differential privacy, where 6 = 6(n,k) denotes a function negligible in its 
argument. We also use this technique to show that the guarantees in terms 
of information content is drastically weaker for an (e€, 6) differentially private 
protocol as compared to an e-differentially private protocol. Our technique 
also simplifies the volume-based lower bounds on noise for counting queries 
in [I]. In addition, we also make the lower bounds unconditional. The lower 
bound in [II] required the mechanism to be defined on “fractional” databases 
i.e., on (Rt)? as opposed to just (Z+)¢ while we do not have any such re- 
strictions. 

2. We give tight lower bounds on noise for ensuring (e£, 6)-differential privacy 
for 6 > 0. This proof relies on a lemma due to [13] showing that (e,6)- 
differentially private mechanisms yield a certain kind of unpredictable source. 
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On the other hand, any mechanism that is blatantly non-private cannot yield 
an unpredictable source. Thus, if the noise is insufficient to prevent bla- 
tant non-privacy then it cannot provide (e, 6)-differential privacy. We subse- 
quently use the lower bounds of for preventing blatant non-privacy to 
get lower bounds on the distortion for (e, ô) differential privacy. 

3. We revisit the LP decoding attack of Dwork, McSherry, and Talwar [10], 
observing that any linear query matrix yielding a Euclidean section suffices 
for the attack. The LP decoding attack succeeds even if a certain constant 
fraction of the responses have wild noise. Armed with the connection to 
Euclidean sections, and a recent result of Rudelson [15] bounding from below 
the least singular value of the Hadamard product of certain i.i.d. matrices, 
we qualitatively strengthen a lower bound of Kasiviswanathan, Rudelson, 
Smith, and Ullman [12] on the noise needed to avert attribute non-privacy in 
f-way marginals release by making the attack resilient to a constant fraction 
of wild responses. 


There is an extension of results of [7| when the size of the universe is smaller 
than the size of the database which can be found in the full version of this 


paper [5]. 


2 Lower Bound by Volume Arguments 


We now recall the volume based argument of Hardt and Talwar [II] to show 
lower bounds on the noise required for e differential privacy. 


Theorem 1. Assume 21,...,0%2» € (Z+)? such that Vi, ||x;||, < n and for 
i Æ j, |z: — z;llı < A. Further, let F : (Z+)* — R* such that for any i # j, 
|F (x:) — F(z) > n. If A < (s—1)/e, then any mechanism which is e- 
differentially private for the query F on databases of size n must add noise 7/2. 


While the line of reasoning in the proof is same as that of [II], we do the proof 
here as the argument in [II] works only for counting queries i.e., when F is a 
linear transformation. On the other hand, the statement and proof of our result 
works for any query F. 


Proof. Consider the œ balls of radius 7/2 around each of the F(a;). By the 
hypothesis, these balls are disjoint. Now assume, any mechanism M which adds 
noise 7/2 and consider any x;. Then, because all the balls are disjoint, we have 
that there is some j # i such that if S is the £% ball of radius 7/2 around F(z;), 
then 
Pr [zeS}<2* 
z€M(ai,F) 

However, we can also say that because the noise added by the mechanism M is 
at most 7, 


Pr [ze S8]>1/2 
z€M(a;,F) 
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Also, because the mechanism M is e-differentially private and ||x; — z;||ı < A, 
then 
Prrem(a:,F)|2 € S] 


> 976A 
Przem(a;,F)|z € S] 


This leads to a contradiction if A < (s — 1)/e thus proving the assertion. 


2.1 Linear Lower Bound for Arbitrary Queries 


In this subsection, we prove the following theorem. 


Theorem 2. For any k,d,n € N and 1/40 > e > 0, where n > min{k/e,d/e}, 
there is a query F : (Z+)4 — R* such that any mechanism M which is e- 
differentially private adds noise 2(min{d/e, k/e}). 

If e > 1, then there is a query F : (Z+)? — R* such that any mechanism 
M which is e-differentially private adds noise Q(min{d/(e-2°°),k/e}) as long as 
n > min{k/e,d/(e-2°*)} 


Before starting the proof, we make a couple of observations. First of all, note that 
the statement of the theorem does not give any lower bound for 1 > e > 1/40. 
However, any mechanism which is e-differentially private for e in the aforemen- 
tioned range is also ¢’-differentially private for e’ = 10/9. Hence, the noise lower 
bounds for ¢’-differential privacy for €’ = 10/9 are also applicable for the range 
of 1 > e > 1/40. It is easy to see that up to constant factors, the lower bounds 
with e’ = 10/9 are optimal for e€ in the aforementioned range. 

Secondly, Laplacian mechanism maintains ¢-differential privacy while adding 
only O(k/e) noise. Also, because the databases are of size n, it is enough to add 
noise O(n) to maintain e-differential privacy for any € > 0. Thus, as long as 
k = O(d), our lower bounds are tight up to constant factors. Next, we do the 
proof of Theorem [2] 

Also, in the subsequent proofs, the databases shall be constructed in clever 
ways. The full details of these constructions can be found in [5]. We will be 
referring to the appropriate claims whenever necessary. 


Proof. Our proof strategy is to construct a set of databases and a query which 
meets the conditions stated in the hypothesis of Theorem [i] and then get the 
desired lower bound on the noise. We first deal with the case when 0 < e < 1/40. 
Let l = min{d,k}. We can now use Claim A.2 in [5] to construct 2° databases 
@1,...,@2= (for s = 4/400) such that z; € (Z+)? with the property that Vi 4 
j, |z: — zjllı > n’/10 and ||zil|ı < n’ where n’ = £/(1280e) (Application of 
Claim A.2 uses d’ = ¢/320). Note that our databases are of size bounded by 
n! < n. We now describe a mapping £ : (Z+)? > R?” which is related to a 
construction in [13]. The mapping is as follows : 


— For every x;, there is a coordinate i in the mapping. 
— The it” coordinate of L(z) is max{n’/30 — ||x; — z||1, 0}. 


Claim. The map £ is 1-Lipschitz t.e., if |z1—z2||ı = 1, then ||£(z1)—L(z2)||1 < 1. 
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Proof. We observe that for any 21, z2 such that ||z1 — z2ļ| < 1, if A denotes the 
set of coordinates where at least one of £(z1) or £L(z2) are non-zero, then A 
is either empty or is a singleton set. Given this, the statement in the claim is 
obvious, since the mapping corresponding to any particular coordinate is clearly 
1-Lipschitz. 


We now describe the queries. Corresponding to any r € {—1,1}?', we define 
fr: (Z+)! +R, as 


d 
f(x) = > Lle): -ri 


Now, we define a random map F : (Z+)? — R* as follows. Pick r1,..., rk € 
{—1,1}*° independently and uniformly at random and define F as follows : 


F(a) = (fri (2), +++ fra (@)) 


Now consider any £h, £j E€ S such that h # j. Because of the way £ is defined, 
it is clear that for any ri, 


Prl|fr;(@n) — fri (ag)| 2 n’/15] > 1/2 
A basic application of the Chernoff bound implies that 


Pr [For at least 1/10 of the r;’s, | f,,(an) — fri (£)| > n /15] > 1 — 275/30 
Ti ysssy Tk 
Now, note that the total number of pairs (x;, £j) of databases such that zx;, xj € S 
is at most 225 < 20/200 < 9/200. This implies (via a union bound) 


Pr [Vh Æ j, > 1/10 of the ri's, | fr,(an) — fr;(xj)| > n /15] > 1 — 275/4 


Tl ysesy Tk 
This implies that we can fix r,,...,7% such that the following is true. 
Vh Æj, For at least 1/10 of the r;’s, |fr (£n) — fr,(az)| > n'/15 


This implies that for any zp Æ 2; € S, ||F (an) — F(z;)l|o > n’/15. In fact, 
|F (an) — F(z;)||2 > n’Vk/150 which is a much stronger assumption than what 
we require and is quantitatively similar to the results in where they consider 
fz noise as opposed to fə noise. 

We can now apply Theorem[I]by putting A = 2n’ and s = £/400 > 3en’ and 
n = n'/15 and observe that A < (s — 1)/e thus proving the result. 

We next deal with the case when e > 1. This part of the proof differs from the 
case when e < 1 only in the construction of 71,...,22s. We also emphasize that 
had we not insisted on integral databases, our proof would have been identical 
to the first part. We construct the databases 21,...,22s using combinatorial 
designs. More precisely, for some sufficiently large constant C, let @ = min{d/(C- 
2°¢), k}. We can now use Claim A.3 from [5] to construct 2° databases z1, .. . , 2s 
(for s = 4/400) such that x; € (Z+)? with the property that Vi 4 j, ||zi—ajl|1 > 
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n’/10 and |\x;||1 < n’ where n’ = £/(1280e) (using d’ = ¢/320 in Claim A.3). 
Again, we note here that the databases constructed are of size n’. 

From this point onwards, we define the map £ and the query F as we did in 
the proof of TheoremP]and the proof proceeds identically. In particular, we get a 
query F : (Z+)? — R* such that for any i £ j, |F (ai) —F(2;)|l2 > n’Vk/150. As 
before, we can now apply Theorem [I] by putting A = 2n’ and s = £/100 > 3en’ 
and 7 = n'/15 and observe that A < (s — 1)/e thus proving the result. 


For the subsequent part of this paper, we only consider lower bounds on e- 
differential privacy for 0 < € < 1 as opposed to e > 1. This is because the 
privacy guarantees one gets becomes unmeaningful when e is large. However, we 
do remark that the results can be carried in a straightforward way to the regime 
of e > 1 using combinatorial designs (like we did for Theorem B). 


Consequences of the Linear Lower Bound. We briefly describe the two 
consequences of the linear lower bound on the noise proven in Theorem [2] The 
first is separation of counting queries from non-counting queries. While our sepa- 
ration gives quantitatively the same results as long as d = kO“) and n = O(k/e), 
for simplicity, we consider the setting when k = d and n = k/e. In this case, 
Theorem [2] shows existence of a (non-counting) query such that maintaining €- 
differential privacy requires noise 2(n). On the other hand, [3] had proven that 
for any counting query with the same setting of parameters, there is a mechanism 
which adds noise Õ(n?/3) and maintains ¢-differential privacy. This shows that 
maintaining e-differential privacy inherently requires more distortion in case of 
non-counting queries than counting queries. 

The next consequence is a separation of (€, 6) differential privacy from (e, 0) 
differential privacy for 6 = 27°), We note that Hardt and Talwar [I] had 
shown such a separation but that was only when k = O(logn) and 6 = n~0). 
Again, we use the setting of parameters when k = d and n = k/e. The gaussian 
mechanism of [8] shows that to maintain (e,6) differential privacy for any k 
queries, it sufficies to add noise O(\/k log(1/d)/e) = o(n). However, Theorem [2] 
shows that there is a query which requires adding noise 2(n) to maintain (e, 0) 
differential privacy. 

The last consequence of our result is more indirect and is explained next. 


2.2 Information Loss in Differentially Private Protocols 


In , a connection was established between differentially private protocols and 
the notion of mutual information from information theory. In fact, as [13] was 
dealing with 2-party protocols, the connection was actually between differentially 
private protocols and that of information content [IJ2] which is a symmetric 
variant of mutual information useful in 2-party protocols. In that paper, it was 
shown that the information content (which simplifies to mutual information 
in our setting) between transcript of a e-differentially private mechanism and 
the database vector is bounded by O(en). Using the construction used in the 
previous subsection, we show that in case of (e€, 6) differentially private protocols 
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(for any 6 = 27°0)), there is no non-trivial bound on the mutual information 
between the transcript of the mechanism and the database vector. Thus as far 
as information theoretic guarantees go, the situation is drastically different for 
pure differentially private protocols vis-a-vis approximately differentially private 
protocols. The contents of this subsection are a result of personal communication 
between the author and Salil Vadhan [6]. 

We first define the notion of mutual information (can be found in standard 
information theory textbooks). 


Definition 5. Given two random variables X and Y, their mutual information 
I(X;Y) is defined as 


I(X;Y) = A(X)+ A(Y) — A(X, Y) = A(X) — H(X|Y) 
where H(X) denotes the Shannon entropy of X. 


The next claim establishes an upper bound on the mutual information between 
transcript of a differentially private protocol and the database vector. 


Claim. Let F : (Z+)? => R* be a query and M : (Zt)? — p(R*) be an e- 
differentially private protocol for answering F for databases of size n. If X is a 
distribution over the inputs in (Z+), then I(M(X);X) < 3en. 


Proof. We first note that since the databases are of size bounded by n, hence 
instead of assuming that u is a distribution over the inputs X € (Z+)4, we can 
assume that p is a distribution over the inputs X € [n]? where [n] = {0,1,...,n}. 
Now, we can apply Proposition 7 from [13]. We note that the aforesaid propo- 
sition is in terms of information content for 2-party protocols but we observe 
that we can simply make the second party’s input as a constant and get that 
I(M(X); X) < 3en. 


Next, we state the following claim which says that for (e, 4) differentially private 
protocols, even for an exponentially small 6, the mutual information between the 
transcript and the input can be as large as n(1—7) for any value of 0 < €,7 < 1. In 
other words, an (€, ô) differentially private protocol does not imply any effective 
bound on the mutual information between the input and the transcript even as 
€ — 0 and 6 is exponentially small. 


Lemma 1. Forn € N and0 < €, < 1, there is a constant C = C (e,n) > 0 and 
a distribution X over (Z*)" with a support over databases of size n and a query 
F : (Z+)" + R* and an (e, 6)-differentially private protocol M for answering F 
such that I(X;M(X)) > n(1 — 2n) if 6 > 2-Clom”, 


Proof. We first construct 2° vectors in {0,1}” (for s = n(1 — ņn)) with the prop- 
erty that for any xi, xj (i # j), |z: — 2; ||1 > n?n/8. It is easy to guarantee 
the existence of such a set of vectors by a simple application of the probabilis- 
tic method. The distribution X is simply the uniform distribution over the set 
{x1,...,22s}. By construction, all the databases in X are of size bounded by n. 
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Next, we define the query F : (Z*+)" — R* be defined in the same way as 

the query F in the proof of Theorem] Following, exactly the same calculations, 
we can show that if we set k = 80n, we get a query F : (Z+)" — R* such 
that for any i Æ j, ||F (ai) — F(x;)\l2 > 1?nVk/50. We now recall the Gaussian 
mechanism of [8] which maintains (e€, ô) differential privacy. 
Lemma 2. [8] Let F : (Z+)? — R* be a query. Let Y = (Yi,...,Y%) be a 
distribution over R} such that each Y; is an i.i.d. N(0,0) random variable. Here 
go? = klog(/6) | Then the mechanism M which for a database x and query F, 
which samples Yo from Y and responds by F(x) + Yo is an (€,6) differentially 
private mechanism. 


Note that for the above mechanism M, and database x, if Z is sampled from 
M(x), then the distribution of M(x) — F(x) is same as (Yi,..., Yp) where each 
Y; is an i.i.d. M (0, g) random variable. Thus, 

M(x) - F(@)IB~ ¥2+...4¥2 


As the following fact shows, the distribution on the right hand side is concen- 
trated around its mean. The fact is possibly well-known but we could not find a 
reference and hence we prove it in Appendix C in [5]. 


Fact 3. If Yı,..., Yp are i.i.d. N(0,c0) random variables, then, 


Pr [Y2 +... +Y S204 6) hoe] <27 
Yisa Yh 


Using the above fact, we get 


<33 
e2 E 


Pr Iimo - F(x)\|2 > ee -6k 


Here the probability is over the randomness of the mechanism. Putting € = 1 
and 6 = 2~C(©")" for an appropriate constant C(e, n), we get that 


env 
200 


Pr mo — F(a)|l2 > | aul 


As we know, for any i Æ j, ||F(ai) — F(2j)|l2 > n?nVk/50. Hence, with proba- 


bility at least 1 — 27” over the randomness of the mechanism, for any database 
x; E€ supp(X), if y is sampled from M(2;), 


Vj #4 |F) — ylle > FP @) — yll2 


Thus, for any x;, given M(a;), we can recover x; with high probability and hence, 
we can say 


A(X|M(X) =y) = 1-27” 
Ero ((XIM(X) = 9) = 0] > 


This means that 

A(X|M(X))<2°-"n <1 
Recall that I(X; M(X)) = H(X) — H(X|M(X)) > A(X)-1=(1-n)n-12> 
(1 — 27)n. This completes the proof of the Lemma [] 
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3 Lower Bound on Noise for Counting Queries 


In the last section, we proved that to preserve e differential privacy for k queries, 
one may need to add 2(k/e) noise provided d, n >> k. However, these queries were 
not counting queries. It is interesting to derive lower bounds on noise required 
to preserve privacy for counting queries as these are the queries mostly used 
in practice. While one might initially hope to prove a similar lower bound for 
counting queries, [3] states that there is a e-differentially private mechanism 
which adds O(n?/3/e) noise per query and can answer O(n) counting queries 
(when d =n), 

Still, Hardt and Talwar showed that to answer k counting queries, any 
mechanism which is e-differentially private must add min{k/e, \/k log(d/k) /e} 
noise (in fact, this is true for k random queries). However, make a tech- 
nical assumption that the mechanism has a smooth extension which works for 
“fractional” databases as well. In other words, they require the domain of the 
mechanism to be (Rt)? as opposed to (Z*)¢. However, it is not clear if this is 
always true i.e., if given a mechanism which is defined only over true (integral) 
databases, one can get a mechanism which is defined over “fractional” databases 
with similar privacy guarantees. 

Next, we prove the same result without making any such technical assump- 
tions. Again, our constructions are dependent on combinatorial designs [I4]. 
First, we prove the following simple but useful claim. 


Claim. Let a € Z and assume 21, %2,...,2° E€ (Z+)? such that Vi, every entry 
of x; is either 0 or a. Also, for every i Æ £, |x; — xe||1 > A. Then, for k > 20s, 
there is a linear query F : (Z+)? — R* such that for every i, € [2°] and i £ £, 
the following holds : 


Pr [F æ); = F@e)jl > 4'/10] 2 1/40 
JE 


where A’ = /A-a. 


Proof. Consider any xi, xe such that i 4 @. Note that, z defined as z = x; — x¢ 
is such that all its entries are 0, +a and also that z has at least A/a or more 
non-zero entries. If we choose r € {—1,1}¢ u.a.r., then note that 


y=) eee Ti = y Zi’ fri 
ra 


i=l B= 


Note that the total number of summands is ¢/ > A/a and hence the distribution 
of the random variable Y is same as choosing r’ € {—1, 1} and considering the 


random variable 
eo 


Y' =a- yor 


i=l 


However using Corollary B.2 from [5], we get 
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VA-a = / A/a 9 
Now, let us choose rj,...,7j, uniformly and independently at random from 


{—1,1}¢ and consider the linear query F : (Z+) — R* defined as 


Set A’ = VA-a. Now, (i) and an application of Chernoff bound implies that 
for any zi, xe (i 4 £) 


Pr | Pr |F; — F(ze);| > 4/10] > 1/40] a ee a a 
A r! jE 


We now observe that the total number of pairs (aj, xe) (i Æ £) is at most 27% < 


2/10. Applying a union bound, we get that there is some choice of r{,...,7% 
(and hence a fixed F) such that 


Pr LF (as — F(ze);| > A’/10] > 1/40 

je 

We now prove a lower bound on the noise required to maintain privacy for 
random counting queries. As we have said before, Hardt and Talwar [II] proved 
the same result under an additional assumption that the mechanism defined over 
integral databases can be smoothly extended to fractional databases as well. 


Theorem 4. For every k,d € N and 1 > e > 0, there is a counting query 
F : (Z*)4 + R! such that any mechanism which maintains €-differential pri- 


vacy adds noise 2(min{k/e, \/k log(d/k)/e}). The size of the database i.e., n = 
O(k/e). 


Proof. The proof strategy is to come up with databases meeting the hypothesis 
of Claim B]and use Claim B]to get a counting query F. We then use Theorem [I] 
to get a lower bound on the distortion required by any private mechanism to 
answer F. We consider two cases : k < logd and k > logd. 

The first case is trivial : Namely, consider databases x1,...,%g%/20 such that 
each x; = |(k/80e)| - e; where e; is the standard unit vector in the i” direction. 
This is possible as there are d > 2" different unit vectors. Note that for any 
i ££, ||x; — ell, = 2-|k/(80e)|. We can now apply Claim B]and get that there 
is a linear query F : (Z+)! > R* (using A = 2- |k/(80e)| and a = |k/(80e)]) 
such that 


v2 
ie |F(xi)j — F(xe)j| 2 -qg l¥/(809)] 2 
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We see that there are 2*/?° = 2° databases which differ by exactly 2-|k/(80¢)| = 
A. Note that A < (s — 1)/e. Hence we can apply Theorem [I] to note that to 
maintain e-differential privacy, any mechanism needs to add k/(800e) noise. In 
fact, we note that the 2 error of the answer returned by the mechanism needs 
to be Q(k?/?/e) which is quantitatively the same as the result in [I]. 

The second case is slightly more complicated. We use Claim A.1 from [5] to 
construct 21,...,@r/20 E (Z+)? with the following properties : 


— Every entry of any of the x;’s is either 0 or a € Z such that a > log(d/k)/160e. 
— Wi, ||xil|ı < &/80€ and Vi F j, |x; — x; \|1 > k/160e 


Again, we can apply ClaimBl]and get that there is a linear query F : (Z+)? — R* 
(using A > k/(160e) and a > (log(d/k)/160e)) such that Vi 4 £ 


Fle); — Fl0d)y| > aq PE) > 140 


Pr 
JE[K] 


Again, we have 2*/?° databases which differ by at most k/(40e) and hence we can 
apply Theorem [I] to get that to maintain ¢-differential privacy, any mechanism 


needs to add 2 (=m) 


noise. 


4 Lower Bounds for Approximate Differential Privacy 


In this section, we prove lower bounds on the noise required to maintain (e, 6) 
differential privacy for €, ô > 0. Our lower bounds are valid for any positive 6 > 0 
and are in fact tight for a constant € and 6. We note that a quantitatively similar 
lower bound was proven for the class of é-way marginals by [12] though our proof 
(for random queries) is arguably much simpler. 

In this section, we consider databases which are elements of {0, 1}” or in other 
words we consider the case when the universe size d = n and the databases 
are allowed to have exactly one element of each type. We note that restrict- 
ing databases to bit vectors is a well-considered model in literature including 


KE] among others. 


We prove the following theorem. 


Theorem 5. For any n € N, e > 0 and 1/20 > 6 > 0, there exist positive 
constants a, y and ņ such that there is a counting query F : {0,1}" > RE with 
k = an such that any mechanism M that satisfies 


Pr[ Pr [|M (z, F); — F(@)i| < nva] 2 1/2 +9] > 3V6 
iE 


is not (€,6) differentially private. In other words, any mechanism M which with 
significant probability i.e., 3V5 answers at least 1/2+ 7 fraction of the k queries 
with at most n/n noise, is not (e, ô) differentially private. 
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An immediate corollary is that there exists a positive constant a and a counting 
query F : {0,1}" — R* where k = an such that any mechanism which adds 
o(./n) noise is not (e€, ô) differentially private for € > 0 and 6 < 1/20. 

To do the proof of Theorem [5] we first need to introduce some definitions 
previously discussed in [13]. We do note that the paper deals with the two- 
party setting but the relevant definitions and the lemma we use here easily extend 
to the standard (curator-client) setting of privacy. 


Definition 6. A random variable Y = (yi,---, Yi-1, Yi; Yi+1;---; Yn) E {0,1}” 
is said to be 6-approximate strongly a-unpredictable bit source (for a > 1) if with 
probability 1 — ô over i € |n] 

1 Pr[Y; = 1]¥1 = m1,.--, Yi—1 = yi-1, Yi+1 = Yi+1,; - - - , Yn = Yn] 


a 
a” Pr[Y; = 0Y =m,..-, Yi—1 = Yi—1, Yi+1 = Yi+1, ---, Yn = Yn] 


<a 


The next lemma (proven in [I3] for the two-party setting) roughly says that for 
any (€, 0) private mechanism, conditioned on the transcript of the mechanism, the 
distribution of the database is a 6-approximate strong 2‘-unpredictable source. 
More precisely, we have the following lemma. 


Lemma 3. Let F : {0,1}" + R* be a query and M be a (e,65)-differentially 
private mechanism for answering F. Let X be the uniform distribution over 
{0,1}" and I’ be the probability distribution over the transcripts of M(x) when 
x is drawn from X. Then for any u > 0 and t + T, the distribution X| raz is ô 
approximate strongly 2°*+"-unpredictable sources such that 


l+e-c# 
[ds] <28- eee, 
ter 1— e7” 


The above lemma trivially follows from Lemma 20 of (full version) and 
hence we do not prove it here. Before, proving Theorem [B] we need to recall the 
following theorem from [I0] (Theorem 24 in the paper). 


Theorem 6. For any y > 0 and any v = v(n), there is a constant a = a(y) > 0 
such that for k = an, there is a counting query F : {0,1}" —> R* and an 
algorithm A such that given y which satisfies 


the output of A ony i.e., A(y) = x' such that x’ € {0,1}” and ||x — 2’ |l, < Aye 
The following corollary follows immediately from Theorem [6] 


Corollary 1. For any 6’ > 0, there are positive constants y = 7(6'),n = 
n(6'), œa = a(6') such that for k = an, there is a counting query F : {0,1}" > R* 
and an algorithm A such that given J which satisfies 


` 1 
Pr [|g — F(x)s| < nVn] > 5+7 
i€ [k] 2 


the output of A on J i.e., A(y) = x' such that 2’ € {0,1}” and ||x — x'||ı < Fn. 
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We now prove Theorem [B] 


Proof (of Theorem[Q). 

Let X denote the uniform distribution over {0,1}”. First, using Lemma [B] 
we get that over the randomness of the mechanism M and the choice of x € X, if 
we sample a transcript t from M (x, F), then for any positive u, the distribution 
X|u(c,F)=t iS a -approximate strongly 2*++_unpredictable sources where 6; 
satisfies 


14 e76 
i ea eee 
t€M(2,F) l—e# 


Clearly, we can put u = 10 and get that the distribution X|m(z,F)=t is a ôt- 
approximate strongly 2°t'°-unpredictable sources where Erem(x,r) [0] < 36. 


By an application of Markov’s inequality, we get that with probability 1 — 2/6 
over the choice of x and the randomness of the mechanism M, the distribution 
X|m(e,F)=t is 2\/5-approximate strongly 2°+!0-unpredictable source. 

We now apply corollary [] In particular, we put 6’ = V6 and get that for 
some positive y, n, œ (which are functions of 6’ and hence 6), there is a counting 
query F’: {0,1}" > R® and an algorithm A such that given J which satisfies 


the output of A on 9 i.e., A(J) = a’ such that x’ € {0,1}” and ||~—2' ||) < Vd-n. 
Now, consider a mechanism M which satisfies 


eae — F(z):| < n/n] > 1/2+7] > 8 


for 8 = 3V6. Clearly such a mechanism M is not (e, ô) differentially private 
because with probability at least 8 = 3/6, the algorithm A will be able to 
predict at least 1 — V6 fraction of the positions which contradicts that with 
probability 1 — 2/5, the distribution X lnt(a,F)=t İS a 2\/5 -approximate strongly 
2°+10_unpredictable source. 


5 LP Decoding, Euclidean Sections and Hardness of 
Releasing -way Marginals 


In this section, we consider attacks on privacy using linear programming. In par- 
ticular, we use the technique of LP decoding (previously used in [IO] in context 
of privacy) to give attacks which violate even minimal notions of privacy when 
1 — €o (for some €o > 0) fraction of the queries are released with insufficient 
noise. We do this by establishing a connection between Euclidean sections and 
use of LP decoding in context of privacy which does not seem to have explicitly 
appeared in the literature before. We remark that the relation between LP de- 
coding and Euclidean spaces is very well known in context of compressed sensing 
[4]. However, in case of privacy, the adversary is allowed to add small error to 
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say 99% of the entries and arbitrary error to the remaining 1% of the entries. In 
context of compressed sensing however, the adversary is allowed to add error to 
only 1% of the entries. 

We first describe how to use linear programming in context of privacy. As- 
sume x € Z+? is a database and A: R — RF is a linear map which represents 
a counting query with arity k made on the database x. Further, the right set of 
answers is given by y = A- x. (To make sure that the queries are 1-Lipschitz, all 
the entries of A come from [—1,1].) Suppose, 7 € R* is the answer returned by 
the mechanism. Then, consider the following optimization problem (which can 
be written as a linear program) : 


Minimize ||y — g||1 subject to y= A-& (2) 


The following theorem states the necessary conditions such that the solution to 
the above linear program, call it Z, is such that ||~ — ||; is small. To state the 
theorem, we will need the definition of a Euclidean section. 


Definition 7. V C RÝ is said to be a (6,d,k) euclidean section if V is a linear 
subspace of dimension d and for every x E€ V, the following holds: 


vklizll2 > llællı > õvkl|zll2 


Theorem 7. Let A: R? > R* be a full rank linear map (k > d) and all the 
singular values of A are at least o. Further, the range of A (denoted by L(A)) 
is a (ô,d, k) Euclidean section. Let F : (Z*+)4 — R* the query corresponding to 
A. Then, there exists y = (6) such that if 


PrP ea JŅi| <a] 21-7 
i€ [k] 


then, any solution & to the linear program (Q) satisfies ||z — z||ı < O(aVkd/o) 
where the constant inside the O(-) notation depends on ô. 


The proof of this theorem can be found in [5]. The specific problem we are in- 
terested in is the application of LP decoding to violate attribute privacy when 
é-way marginals of a contingency table are released. Informally, attribute privacy 
refers to the situation in a contingency table when all but one of the attributes 
are public and attacks on privacy amount to revealing the last attribute given 
the responses to the queries and knowledge of all the other attributes. Releasing 
the ¢-way marginals is simply the following : For every subset of size £ of the 
attributes and every configuration of these ¢-attributes, a count of how many 
entries in the database have that specific configuration on those ¢-attributes is 
released. Due to the lack of space, we refer the reader to [12]5] for the precise 
definitions of attribute privacy and ¢-way marginals. We will also need the defi- 
nition of row products of matrices which can be found in [5]. The next theorem 
(proven in [5]) shows how if the range of row product of matrices is Euclidean 
and all the singular values of the row product are large, one can violate attribute 
privacy when noisy ¢-way marginals are released. 
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Lemma 4. Let A,,...,Ag-1 € {0,1}@*". Let A = Ayo Ag... 0 Ap_1 (with 
d*-! > n) be their row product. Also, all the singular values of A are at least 
o and the range of A i.e., L(A) is a (6,n,d~+) Euclidean section. Then, there 
exists a constant y = y(6) > 0 such that any mechanism which answers at 
least 1 — y fraction of the €-way marginals with noise bounded by a is attribute 


non-private provided avain = o(n) or in other words, a = o(./na/Vd"—1) 
The main technical tool for us is the following theorem of Rudelson [I5]. 


Theorem 8. Let q,£ € N be constants. Also, let D ~ RY*” be a dis- 
tribution over matrices such that every entry of the matrix is an independent 
and unbiased {0,1} random variable. Let Ay,...,Ag—1 be i.i.d. copies of ran- 
dom matrices drawn from the distribution D and A be the Hadamard product of 
Ay,...,Ae_1. Then, provided that dt >> nlog(,)n, with probability 1 — o(1), 
the smallest singular value of A denoted by o,(A) satisfies o,(A) = Q(Vd’") 
Also, the range of A is a(n, d'*~1, y(q, ©) Euclidean section for some y(q, © > 0. 


The above theorem uses the notion of iterated logarithm which is defined as :For 
r € N, we define logg) n as follows : loga) n = max{logyn,1} and for r > 1, 
login) n = loga) (log(,—1) n). Combining Theorem [8] and Lemma [4] we have the 
main theorem of this section. 


Theorem 9. Let q,4 E€ N be constant integers. Then, there exists a constant 
y=¥(¢,0) > 0 such that any mechanism which releases the ¢-way marginals of 
a table of size n over d! attributes and n < d'’—1 logi) n by adding at most 7 
noise to 1 — y fraction of the queries where 


n = 0(/n) 


is attribute non-private. Further, the algorithm which violates attribute privacy 
is efficient and uses LP decoding. 


This improves upon the following result of Kasiviswanathan et al. [12] who could 
violate attribute privacy only when all the queries were allowed o(y/n) noise. 


Theorem 10. Let £ € N be a constant and n,d € N such that d'‘! > 
n-log*n. Then, for every mechanism M which releases €-way marginals of 
a database of size n (and universe {0,1}* ) such that the noise for every single 
query is bounded by n where n < aa is attribute non-private. The attack 


is an efficient algorithm based on 2 norm minimization. 


The details of the results in this section can be found in [5]. 
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Abstract. In this paper we study the problem of approximately releasing the 
cut function of a graph while preserving differential privacy, and give new algo- 
rithms (and new analyses of existing algorithms) in both the interactive and non- 
interactive settings. 

Our algorithms in the interactive setting are achieved by revisiting the prob- 
lem of releasing differentially private, approximate answers to a large number of 
queries on a database. We show that several algorithms for this problem fall into 
the same basic framework, and are based on the existence of objects which we 
call iterative database construction algorithms. We give a new generic framework 
in which new (efficient) IDC algorithms give rise to new (efficient) interactive 
private query release mechanisms. Our modular analysis simplifies and tightens 
the analysis of previous algorithms, leading to improved bounds. We then give a 
new IDC algorithm (and therefore a new private, interactive query release mech- 
anism) based on the Frieze/Kannan low-rank matrix decomposition. This new re- 
lease mechanism gives an improvement on prior work in a range of parameters 
where the size of the database is comparable to the size of the data universe (such 
as releasing all cut queries on dense graphs). 

We also give a non-interactive algorithm for efficiently releasing private 
synthetic data for graph cuts with error O (|V |5). Our algorithm is based on ran- 
domized response and a non-private implementation of the SDP-based, constant- 
factor approximation algorithm for cut-norm due to Alon and Naor. Finally, we 
give a reduction based on the IDC framework showing that an efficient, private al- 
gorithm for computing sufficiently accurate rank-1 matrix approximations would 
lead to an improved efficient algorithm for releasing private synthetic data for 
graph cuts. We leave finding such an algorithm as our main open problem. 


1 Introduction 


Consider a graph representing the online communications between a set of individuals; 
each vertex represents a user, and an edge between two users indicates that they have 
corresponded by email. It might be useful to allow data analysts to mine this graph for 
statistical information. However, the graph is also composed of sensitive information, 
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and we cannot release information that reveals much about the existence of specific 
edges. Thus we would like a way to analyze the structure of this graph while protect- 
ing the privacy of individual edges. Specifically we would like to guarantee differential 
privacy |7] (defined in Section[2), which, roughly, requires that our algorithms be ran- 
domized, and induce nearly the same distribution over outcomes when given two data 
sets (e.g. graphs) which differ in only a single point (e.g. an edge). 


Table 1. Comparison of accuracy bounds for linear queries. The bounds in the first column are 
prior to this work, the second column are what we achieve in this work, and the last column are the 
new bounds instantiated for releasing all cut queries. The bounds listed here are approximate and 
hide the dependence on certain parameters, such as 6 and 3. n denotes database size, k denotes the 
total number of queries answered, and 4 represents the data universe. For a graph G = (V, E), 
n = m = |E], |X| = (Y), and for all cut queries, k = 2°! |. Previous efficient results do not 
achieve non-trivial (< ||) error, while all of the new bounds do for sufficiently dense graphs. 


Previous Bounds Lis Kapai 
General Bounds All Cut Queries 


Å || _. .-. a O 1/2 
n1/? (log k)(log |¥|)+/4 | n1/? (log k)t/? (log |’ |)1/4 | |B|4/?|V|4/? dog |v])t/4 
€ e172 e1/2 


1/4 
ngl* dog k)!/2|x]1/4 p 


1/2 


1/2 
K-Norm Mech. vE (log (“)) {| Not in IDC Framework 


“ The bounds listed here are for linear queries. The Median Mechanism more generally works 
for any set of low sensitivity queries Q that have an a-net of size Na(Q). We improve the 


New in this paper 


l 2p Vos Na (Q) log k 
bound from the solution to œ = Iela (2) los E to the solution to a = vice Ye) 
» Here we use n2 = ||D]|Ż, in contrast to other known IDCs, whose error is in terms of n = 


\|D||1. Note that n < nz < n?. 
€ For k < |¥|/2. This is an approximate bound on average per-query error. All other algorithms 
listed bound worst-case per-query error. 


One natural objective is to provide private access to the cut function of this graph. 
That is, to provide a privacy preserving way for a data analyst to specify any two (of 
the exponentially many) subsets of individuals, and to discover (up to some error) the 
number of email correspondences that have passed between these two groups. There are 
two ways we might try to achieve this goal: We could give an interactive solution where 
we give the analyst private oracle access to the cut function. Here the user can write 
down any sequence of cut queries and the oracle will respond with private, approximate 
answers. We may also try for a stronger, non-interactive solution, in which we release 
a private synthetic dataset; a new, private graph that approximately preserves the cut 
function of the original graph. 
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The case of answering cut queries on a graph is just one instance of the more general 
problem of query release for exponentially sized families of linear queries on a data set. 
Although this problem has been extensively studied in the differential privacy literature, 
we observe that no previously known efficient solution is suitable for the case of releas- 
ing all cut queries on graphs. In the setting of cut queries on a graph, we use “efficient 
solution” roughly to mean one in which each query is answered in time poly(|V]), in 
the interactive setting, or one in which the whole construction runs in time poly (|V |), in 
the non-interactive setting. In this paper we provide both efficient interactive and non- 
interactive solutions for this problem. 

We give a generic framework that converts objects we call iterative database con- 
struction (IDC) algorithms into private query release mechanisms in both the interactive 
and non-interactive settings. This framework generalizes the median mechanism [19], 
the online multiplicative weights mechanism [15], and the offline multiplicative weights 
mechanism [14]. Our framework gives a simple, modular analysis of all of these 
mechanisms, which lead to tighter bounds in the interactive setting than those given in 
and [15]. These improved bounds are crucial to our objective of giving non-trivial 
approximations to all possible cut queries. We also instantiate this framework with a 
new IDC algorithm for arbitrary linear queries that is based on the Frieze/Kannan low- 
rank matrix decomposition and is tailored to releasing cut queries. This algorithm 
leads to a new online query release mechanism for linear queries that gives a better ap- 
proximation in settings (such as we would encounter trying to answer all cut queries on 
a dense graph) where the database size is comparable to the size of the data universe. 
We summarize our bounds in Table [I] 

We also give a new algorithm (building on techniques for constructing private syn- 
thetic data in [2] [8]) in the non-interactive setting that efficiently generates private syn- 
thetic graphs that approximately preserve the cut function. Finally, we use our IDC 
framework to show that an efficient, private algorithm for privately computing good 
rank-1 approximations to matrices would automatically yield efficient private algo- 
rithms for releasing synthetic graphs with improved approximation guarantees. 


1.1 Our Results and Techniques 


Our main conceptual contribution is to define the abstraction of iterative database con- 
struction (IDC) algorithms (SectionB) and to show that an efficient IDC for any class of 
queries Q automatically yields an efficient private data release mechanism for Q in both 
the interactive and non-interactive settings. Informally, IDCs construct a data structure 
that can be used to answer all the queries in Q by iteratively improving a hypothesis data 
structure. Moreover, they update the hypothesis when given a query witnessing a sig- 
nificant difference between the hypothesis data structure and the underlying database. 

In hindsight, this framework generalizes the median mechanism and the online 
multiplicative weights mechanism [15]. It also generalizes the offline multiplicative 
weights mechanism [14]. All of these mechanisms can be seen to use IDCs of the 
sort we define in this work. (In Appendix[A]we show how these algorithms fall into the 
IDC framework.) 
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Our generalization and abstraction also allows for a simple, modular analysis of 
mechanisms based on IDCs. Using this analysis, we are able to show improved bounds 
on the accuracy of both the median mechanism and multiplicative weights mechanism. 
These improved bounds are significant in our application of using an interactive mech- 
anism to release a large number of cut queries and crucial if we want to answer all cut 
queries. When answering all cut queries, the previous bounds would not guarantee er- 
ror that is < |E|, meaning that the error may be larger than the largest cut in the graph. 
Of course, we can privately guarantee error < || simply by releasing the answer 0 for 
every cut query. Our new analysis shows that these mechanisms are capable of answer- 
ing all 27'”! cut queries with error o(| E|) on sufficiently dense graphs; e.g., multiplica- 
tive weights gives sublinear error for graphs with | E| = w(|V|,/log|V]). 

Although it may seem unrealistic to answer all cut queries using an interactive mech- 
anism, our new analysis allows us to give a best-of-both-worlds guarantee that we can 
answer each query efficiently with non-trivial accuracy without ever having to “shut 
off” the algorithm for answering too many queries. In practice it may be preferable to 
limit the number of queries the interactive mechanism will have to answer, in order to 
improve the accuracy of the responses. In this case our new bounds still offer signifi- 
cant improvements in accuracy. 

We also define a new IDC based on the Frieze/Kannan low-rank matrix decompo- 
sition [10], which yields a private interactive mechanism for releasing linear queries. 
Our new mechanism outperforms previously known techniques when the size of the 
database is comparable to the size of the data universe, as is the case on a dense graph. 
The error for the Frieze/Kannan IDC is smaller than that for multiplicative weights for 
extremely dense graphs, where | Z| = 2(|V|?/log|V)). 

We then consider the problem of efficiently releasing private synthetic data for the 
class of cut queries. We show that a technique based on randomized response efficiently 
yields a private data structure (but not a synthetic database) capable of answering any 
cut query on a graph with |V| vertices up to maximum error O(|V|!-°). (Note this er- 
ror is independent of the density of the graph and the Frieze/Kannan and multiplicative 
weights IDCs introduce smaller error for sparser graphs.) We then show how to use this 
data structure to efficiently construct a synthetic database with only a small constant fac- 
tor blowup in our error. Our algorithm is based on a technique for constructing synthetic 
data in [2]|8]. Their observation is that, for linear queries, the set of accurate synthetic 
databases is described by a (large) set of linear constraints. In the case of cut queries, we 
are able to use a constant-factor approximation to the cut-norm due to Alon and Naor 
as the separation oracle to find a feasible solution (and thus a synthetic database) ef- 
ficiently. Finally, we show how the existence of an efficient private algorithm for finding 
good low-rank approximations to matrices would imply the existence of an improved al- 
gorithm for privately releasing synthetic data for cut queries, using our IDC framework. 

To summarize the results for cut queries: between the multiplicative weights IDC, 
the Frieze/Kannan IDC, and randomized response, the best mechanism depends on |£]. 
When |E] is below O(|V|?/log|V|), the multiplicative weights IDC introduces the 
least error. For |E| lying between O(|V|?/log|V|) and O(|V|?), the Frieze/Kannan 
IDC introduces the least error. Both IDC mechanisms have error increasing with |Æ], fi- 
nally matching the error for randomized response when |E| = O(|V|?). When 
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answering k queries, the error for all three mechanisms depends on \/log k, so these 
thresholds are independent of the number of queries. 


1.2 Related Work 


Differential privacy, introduced in a series of papers [4] [6] [7] in the last decade, has 
become a standard solution concept for statistical database privacy. The first mecha- 
nism for simultaneously releasing the answers to exponentially large classes of statisti- 
cal queries was given in [5]. They showed that the existence of small nets for a class of 
queries Q automatically yields a (computationally inefficient) non-interactive, private 
algorithm for releasing answers to all the queries in Q with low error. Subsequent im- 
provements were given by Dwork et al. [8} [9]. 

Roth and Roughgarden showed that large classes of queries could also be re- 
leased with low error in the interactive setting, in which queries may arrive online, and 
the mechanism must provide answers before knowing which queries will arrive in the 
future. Subsequently, Hardt and Rothblum gave improved bounds for the online 
query release problem based on the multiplicative weights algorithm. In hindsight, both 
of these algorithms follow the same basic framework, which is to use an IDC. 

Gupta et al. gave a non-interactive data release mechanism based on the multi- 
plicative weights algorithm and an arbitrary agnostic learner for a class of queries. An 
instantiation of this algorithm (the offline multiplicative weights algorithm) using the 
generic agnostic learner of Kasiviswanathan et al. (who use the exponential mech- 
anism of [18]) was implemented and experimentally evaluated on the task of releasing 
small conjunctions to low error on real data by Hardt, Ligett, and McSherry [14]. This 
algorithm gives bounds comparable to those given in this paper, but it does not work 
in the interactive setting, and is not computationally efficient for settings in which the 
number of queries is exponentially larger than the database size (as is the case with 
graph cuts). We note in Section [/]that this generic algorithm can also be instantiated 
with any iterative database construction algorithm. 

Hardt and Talwar consider the setting where the number of queries is smaller 
than the universe size and introduced the K-Norm mechanism. Subsequent improve- 
ments were given by [3]. When the number of queries and the database size are com- 
parable to the universe size (i.e. |Q| = (||), n > 2(|X|/log|4’|)), the K-Norm 
mechanism gives average error that is smaller than the worst-case error promised by the 
online multiplicative weights mechanism. In this range of parameters the Frieze/Kannan 
IDC and the K-Norm mechanism both improve on the online multiplicative weights, 
and give roughly the same error. However, the Frieze/Kannan IDC has bounded worse- 
case error, as opposed to average-case error. In general the two mechanisms are in- 
comparable, as the error of the Frieze/Kannan IDC has bounded worse-case error and 
applies even when |Q| > ||, but its error has polynomial, rather than logarithmic de- 
pendence on |1’|. 

The Frieze-Kannan low-rank approximation (or the weak regularity lemma) shows 
that every matrix can be approximated by a sum of a small number of cut matrices 
[LO], and this fact has many important algorithmic applications. We also use the fact that 
the proof extends to more general settings, as was noted by [20]. 
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2 Preliminaries 


In this paper, we study datasets D that consist of collections of n elements from some 
universe X. We can also write D € N!*! when it is convenient to represent D as a 
histogram over X. We say that two databases D, D’ are adjacent if they differ in only 
a single element. As histograms, they are adjacent if |D — D'||ı < 1. We will require 
that our algorithms satisfy differential privacy: 


Definition 1 (Differential Privacy). A randomized algorithm M : N|*! — R (for any 
abstract range R) satisfies (€, 6)-differential privacy if for all adjacent databases D and 
D', and for all events S C R, Pr[M(D) € S] < exp(e) Pr[M(D’) € S]+6 


We will generally think of € as being a small constant, and ô as being negligibly small 
—i.e. smaller than any inverse polynomial function of n. 

We note that when we will discuss interactive mechanisms, we must view the output 
of a mechanism as the transcript of an interaction between an adaptive adversary who 
supplies questions about the database based on previous outcomes of the mechanism, 
and the mechanism itself. For clarity, in this paper we will elide specifics about the 
model of adaptive private composition. For a detailed treatment of this issue, see [9]. 

A useful distribution is the Laplace distribution. 


Definition 2 (The Laplace Distribution). The Laplace Distribution with mean 0 and 
scale b is the distribution with probability density function: Lap(x\b) = 35 exp(— lel), 
We will sometimes write Lap(b) to denote the Laplace distribution with scale b, and 
will sometimes abuse notation and write Lap(b) simply to denote a random variable 


X ~ Lap(b). 


A fundamental result in data privacy is that perturbing low sensitivity queries with 
Laplace noise preserves (€, 0)-differential privacy. 


Theorem 1 (7I). Suppose Q : N'*! — R* is a function such that for all adjacent 
databases D and D', |Q(D) — Q(D’)||1 < 1. Then the procedure which on input D 
releases Q(D)+(X1,..., Xg), where each X; is an independent draw from a Lap(1/e) 
distribution, preserves (e€, 0)-differential privacy. 


It will be useful to understand how privacy parameters for individual steps of an algo- 
rithm compose into privacy guarantees for the entire algorithm. The following useful 
theorem is due to Dwork, Rothblum, and Vadhan: 


Theorem 2 ([9]). Let 0 < € < 1 be a parameter. Let P,Q be probability measures 
supported on a set S such that maxses |log (P(s)/Q(s))| < €. Then 


Up [log (P(s)/Q(s))] < 2e. 


We are interested in privately releasing accurate answers to large collections of queries. 
Queries are functions Q : N X| -> R, and we denote collections of queries by Q. We 
write k = |Q| to denote the cardinality of the set of queries. 

A common type of queries are linear queries. A linear query Q has a representation 
as a vector [0, 1]!*!, and can be evaluated on a database by taking the dot product 
between the query and the histogram representation of the database: Q(D) = Q - D. 
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Definition 3 (Accuracy). Let Q be a set of queries. A mechanism M : N'¥! + R 
is (a, 3)-accurate for Q if there exists a function Eval : Q x R — R s.t. for every 
database D € NI¥!, with probability at least 1 — 8 over the coins of M, M (D) outputs 
r € R such that maxgeg |Q(D) — Eval(Q,1)| < a. We will abuse notation and write 


Q(r) = Eval(Q, r). 


We say that an algorithm M releases synthetic data (as is the case for our new IDC, 
as well as the multiplicative weights IDC [15]) if R = N!*! In this case, M(D) = 
D € NI* and Eval(D, Q) = Q(D). We say that a synthetic data release algorithm is 
efficient if it runs in time polynomial in n = ||D]|1, the size of the data set. Note that if 
n < |X], efficient algorithms will have to input and output concise representations of 
the dataset (i.e., as collections of items from the universe) instead of using the histogram 
representation. Nevertheless, it will be convenient to think of datasets as histograms. 

We say an algorithm efficiently releases k queries from a class Q in the interactive 
setting if on an arbitrary, adaptively chosen stream of queries Q1,..., Qk, it outputs 
answers a1, . . . , ak. The algorithm must output each a, after receiving query Q; but be- 
fore receiving Q;+1, and is only allowed poly(n) run time per query. We are typically 
interested in the case when k can be exponentially large in n. Note that as far as com- 
putational efficiency is concerned, releasing synthetic data for a class of queries k is at 
least as difficult as releasing queries from k in the interactive setting, since we can use 
the synthetic data to answer queries interactively. 


Graphs and Cuts. When we consider datasets that represent graphs G = (V, E), we 
think of the database as being the edge set Dg = E, and the data-universe being the 
collection of all possible edges in the complete graph: |X| = (Yl) . That is, we consider 
the vertex set to be common among all graphs, which differ only in their edge sets. One 
example we care about is approximating the cut function of a sensitive graph G. 

For any real-valued matrix A € R®*”™ , for S C |m] and T C [m’], we define 
A(S,T) := Yi sester Ast- The cut norm of the matrix A is now defined as || A||c := 
MaXgcfm},rC{m’] |A(S,T)|. A graph G can be represented as its adjacency matrix 
Ag € {0,1}IYIxIVI, In this paper, a cut in a graph G is defined by any two subsets of 
vertices S,T C V. We write the value of an S,T cut in G as G(S,T) := Ag(S,T), 
where Aq is the adjacency matrix of G. Similarly, we extend the definition of cut norm 
to n vertex graphs naturally by defining ||G||c¢ := || Ac||c = maxs,rcv |G(S,T)| and 
|G — H||c := ||Ag — Axn||c. The class of cut queries Qcw = {Qs,r : S,T C V}, 
where Qs.r(G) = Ac(S,T). Note that cut queries are an example of a class of lin- 
ear queries, because we can represent them as a vector in which Qs rli,j] = 1 if 
i € S,j € T and 0 otherwise, and evaluate Qs,r(G) = Xaijev Qs,rli, j| - Acli, j]. 

Note that as linear queries, we can write cut queries as the outer product of two 
vectors: Qs, T = Xs ` XA, where ys, xr € {0, 1}IV are the characteristic vectors of 
the sets S and T respectively. Let us define a more general class of rank-/ queries on 
graphs to be a subset of all linear queries: Qı = {Q € [0,1]!V!*'Y! such that Q = 
u- vT for some vectors u,v € [0,1]!”!} . Of course the set of rank-1 queries includes 
the set of cut queries, and any mechanism that is accurate with respect to rank-1 queries 
is also accurate with respect to cut queries. 
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Proofs. Because of space constraints, many of the proofs in this paper have been omit- 
ted. The interested reader can see full proofs in the full version of this paper: [13]. 


3 Iterative Database Constructions 


In this section we define the abstraction of iterative database constructions that in- 
cludes our new Frieze/Kannan construction and several existing algorithm asa 
special case. Roughly, each of these mechanisms works by maintaining a sequence of 
data structures D“),D(),... that give increasingly good approximations to the input 
database D (in a sense that depends on the IDC). Moreover, these mechanisms produce 
the next data structure in the sequence by considering only one query Q that distin- 
guishes the real database in the sense that Q(D“) differs significantly from Q(D). 
Syntactically, we will consider functions of the form U : Ry x Q x R > Ruy. The 
inputs to U are a data structure in Ry, which represents the current data structure Dt), 
a query Q, which represents the distinguishing query, and may be restricted to a certain 
set Q; and also a real number. which estimates Q(D). Formally, we define a database 
update sequence , to capture the sequence of inputs to U used to generate the database 
sequence DY, DË)... 
Definition 4 (Database Update Sequence). Let D € N!*! be any database and let 
{(D®, Q®, ao} P E (Ru x Q x R)®© be a sequence of tuples. We say the 


=1,..., 


sequence is an (U, D, Q, a, C)-database update sequence if it satisfies the following 
properties: 


1 DY = D(0,-,°), 
2. for every t = 1,2,...,C, |QM(D) — Qh (D®)| > a, 


3. for every t = 1,2,...,C, |QM(D) — AM] <a, 
4. and for every t = 1,2,...,C — 1, D4) = U(D®O QO, AM), 


We note that for all of the iterative database constructions we consider, the approximate 
answer A‘) is used only to determine the sign of Q® (D)—Q (D™), which is the mo- 
tivation for requiring that A) have error smaller than a. The main measure of efficiency 
we’re interested in from an iterative database construction is the maximum number of 
updates we need to perform before the database D®) approximates D well with respect 
to the queries in Q. To this end we define an iterative database construction as follows: 


Definition 5 (Iterative Database Construction). Let U : Ry x Q x R > Ry bean 
update rule and let B : R + R be a function. We say U is a B(a)-iterative database 
construction for query class Q if for every database D € N!*!, every (U, D, Q, a, C)- 
database update sequence satisfies C < B(a). 


Note that, by definition, if U is a B(a)-iterative database construction, then given any 
maximal (U, D, Q, a, C)-database update sequence, the final database D(C) must sat- 
isfy maxgeg |Q(D) — Q(DO))| < a or else there would exist another query satisfy- 
ing property 2 of Definition[4] and thus there would exist a (U, D, Q, a, C+1)-database 
update sequence, contradicting maximality. 
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4 Query Release from Iterative Database Construction 


In this section we describe an interactive algorithm for releasing linear queries using an 
arbitrary iterative database construction. 


Algorithm 1. Online Query Release Mechanism 
M®"(D, e, 5,4, 8, k): 
Input: A database D € N!*!, a parameter a € R, parameters €, ô, 8 € [0, 1], and the number 


of queries k € N. Oracle access to U, a B = B(a)-iterative database construction for Q. 
Parameters: 


0o =Q 


jee OUV Ba) logt) VE ee T = T(a) := 40 (a) - log(2k/8). 


Se D® := U(O,-,-), C = 0. 
For: t = 1,2,...,k 
1. Receive a query QM € Qand compute 


ZO Lap(o) AY = Q® (D) AY = Q(D)4+Z A® = QM (D™) 


2. If: |AM — AM| < T then: output A® and set D+) = DO 
Else: output A“, set D+) = U (oe QM, Am), and set C = C +1. 
3. If: C = B(a) then: terminate. 


4.1 Privacy Analysis 
Theorem 3. Algorithm[I]is (€, 5)-differentially private. 


Proof (Proof Sketch). Our privacy analysis follows the approach of straightfor- 
wardly. The details appear in the full version of the paper. Intuitively, we will try to 
classify the answers to the queries by the amount of “information leaked about the 
database.” This classification will lead to a bound on the total amount of information 
leaked, and a tighter bound can be deduced using Theorem[2] 

At a very high level, the argument can be thought of in two steps. The first is to argue 
that the noise we add has large enough magnitude that the information leaked in the 
(small number of) “update rounds” is small. This step is simple and follows from the 
bound on the number of update rounds and the well-known properties of the Laplace 
distribution. The second step is to argue that the location of the update rounds also leaks 
little information. This second step is more difficult, and requires reasoning carefully 
about rounds that are “close to update rounds.” 

More specifically, though still informally we will consider three possible ranges for 
the value of the noise Z“) in each round t = 1,2,...,k.. Intuitively the three cases 
are as follows: 1) The noise is sufficiently small that there would never be an update, 
even if the input database were exchanged with an adjacent one. Here we argue no 
information is leaked. 2) The noise is sufficiently large that there would always be an 
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update, even if the database were exchanged with an adjacent one. In these rounds there 
is information leaked, but we also increment C,, and thus there cannot be too many of 
these before terminating. 3) The noise is intermediate, such that we do not do an update 
and increment C’, but might if we switched to an adjacent database. In principle there 
may be as many as k such rounds, however it will turn out with high probability the 
number of such rounds is not much bigger than B. 

We then complete the proof by applying Theorem[2] to bound the expected privacy 
loss over the course of all the rounds, and apply Azuma’s inequality to argue that except 
with probability ô, the total privacy loss does not exceed e. 


4.2 Utility Analysis 


Theorem 4. Let D € NI¥I be any database. And U be a B(a)-iterative database 
construction for query class Q. Then for any B,€,6 > 0, Algorithm[]is (4 , B) = 
accurate for Q, as long as T(a) € [4a/3, 2a]. 


Proof (Proof sketch). Roughly, the argument is as follows: Assume we did not add any 
noise to the queries. Then we would answer each query with the exactly-correct answer 
A or with AM so long as A“) is sufficiently close to A. Essentially, all we do in 
the proof is show that this intuition remains correct when noise is added. 

When adding noise we answer with either A® + Z or AM, so long as A is 
sufficiently close to A® + Z), It is not hard to argue that Z“ remains small in every 
round, and thus the answers in the latter case are not much less accurate than the answers 
in the former case. 

What remains to be shown is that the mechanism does not terminate early due to 
the condition C = B. In order to do this, we show that the sequence of updates forms 
a database update sequence, and thus cannot be too long if U is an efficient iterative 
database construction. In order to do this, we argue that Z“) is sufficiently small that 
the condition for performing an update (AM +Z4 — A® | > T) is sufficient to ensure 
that the query is a good distinguisher (| A“) — AM] > a). 


In order to get the best accuracy parameters, one can just solve for the equation 
a = 3T(a)/4; substituting for T(-), this is the same as solving the following equation 


for œ: a = ey Zo) lost?) lostk/ A) veel Bes UL Using this method we obtain bounds on the error 
for various IDCs, which are summarized both in Table[I]and in the full version. 


5 An Iterative Database Construction Based on Frieze/Kannan 


In this section we describe and analyze an iterative database construction based on the 

Frieze/Kannan “cut decomposition” . Although the style of analysis we use was 

originally applied specifically to cuts in [10], their argument generalizes to arbitrary 

linear queries. To our knowledge, such a generalization was first observed in [20]. 
Note that the sum in Algorithm[2]denotes entrywise vector addition. 


Theorem 5. Let D € N!*! be a dataset. For any a > 0, UFE is a B(a)-iterative 


2 
database construction for a class of linear queries Q, where B(a) = (Pilg 
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Algorithm 2. The Frieze/Kannan-based IDC 
UZ“ (D, Q, A): 

If: D = 0 then: output D’ = Ø 

Else if: Q(D) — A > 0 then: output D' = D — 


Ta @ 
Else if: Q(D) — A < 0 then: output D’ = +g 


Proof (Proof sketch). Let D € N!*! be any database and let { (DY ,Q®, Ay} 


be (UE K D, Q, «a, B)-database update sequence (Definition [4). We want to show that 
C < ||D]|2|/¥|/a?. Specifically, after ||D||2|X|/a? invocations of UF*, the database 
DIliPlla\¥1/2”) is (a, Q)-accurate for D, and thus there cannot be a sequence of longer 
than ||D||3|4|/a? queries that satisfy property 2 of Definition] 

In order to formalize this intuition, we use a potential argument as in to show 
that for every t = 1,2,..., B, D+ is significantly closer to D than D“). Specifically, 
our potential function is the L3 norm of the database D — D®), defined as ||D||2 = 
Sex P(i)?. Observe that |D — D@)||2 = ||D|]3, and ||D||3 > 0. Thus it will suffices 


to show, as we do in the full proof, that in every step, the potential decreases by a? /|4’|. 


Corollary 1. Let y = O (i A Tog(k/B)). Then Algorithm [I] instanti- 


ated with Us is (€, ô)-differentially private and an (a, 3)-accurate interactive re- 


1/4 1/4, /iog(k/2)log(1/5) 
lease mechanism for query set Q with a = O o where 
nz = ||D\||3. Note that for databases that are subsets of the data universe (rather than 


multisets), ng = N. 


Remark 1. For the setting in which the database represents a graph and the queryset 
contains all cut queries, this bounds is O(|V||E|!/+/,/e). This improves on the accu- 
racy of the multiplicative weights IDC for dense graphs with |E| > 2(|V|?/log|V)). 


6 Results for Synthetic Data 


In this section, we consider the more demanding task of efficiently releasing synthetic 
data for the class of cut queries on graphs. Our algorithm is simple, and is based on 
releasing a noisy histogram. Note that for a graph, |¥| = (X1) , and D = E, so as long 
as |E| = §2(|V]), the universe is at most a polynomial in the database size. (Moreover, 
it is easy to show that there does not exist any (e€, 0)-private mechanism that has error 


o(|V |), so the only interesting cases are when |E| = §2(|V]).) 
Consider a database whose elements are drawn from 1; we represent this as a vec- 
tor (histogram) D € NI*l Le D =D + (Yi,... Yixı) be a “noisy” database, where 


each Y; ~ Lap(1/e) is an independent draw irom the Laplace distribution. Note that 
by Theorem[]] the procedure which on input D releases the noisy database D preserves 
(e, 0)-differential privacy. This follows because the histogram vector can be viewed as 
simply the evaluation of the identity query Q : N!*! > N!*!, which can be easily seen 
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to be 1-sensitive. At this stage, we could release D and be satisfied that we have de- 
signed a private algorithm. There are two issues: first, we must analyze the utility guar- 
antees that D has with respect to our query set Q. Second, D is not quite synthetic data. 
It will be a vector with possibly negative entries, and so does not represent a histogram. 
Interpreted as a graph, it will be a weighted graph with negative edge weights. Such 
an answer may be insufficient for some applications, so in Section|6.1]we show how to 
convert such an answer into [0, 1] weighted graph with similar accuracy guarantees. 

The utility guarantee of this procedure over the collections Q of linear queries is 
also not difficult; i.e., each query Q € Q is a vector in [0, 1]!*!, and on any database D 
evaluates to Q(D) = (Q, D}. 


Lemma 1. Suppose that Q C [0,1]!*! is some collection of linear queries. For the 
case |Q| < (8/2) 21¥1/6, it holds that with probability at least 1 — 8, for every query 


Q € Q, |Q(D) — Q(D)| < e-! 6/4] log(|Q]/8). For general Q, the error bound is 
O(e~* V/A] log(| 41/8) log (| Q|/8)). 


The proof of this lemma uses standard moment-generating function techniques and is 
deferred to the full version. 

In summary, the bounds on the error are ~ ely |X| log |Q], with some correction 
terms depending on whether the size of the query set is at most 20(!*!) or larger. 


6.1 Randomized Response and Synthetic Data for Cut Queries 


For the case of cuts in graph on a vertex set V, the database is a vector in {0, 1}, 
and the noisy database just adds independent Lap(1/<) noise to each bit value. Since 
the query set Qcuts has size ye (namely it consists of all (S, T) pairs), we have 
\Qeurs| K (B/2)2!*!/6 for all reasonable 8 and |V|, we can use the randomized re- 
sponse analysis above to get accuracy 


o ( (C ost Qanel/8)) fe) = OUVI + IV11081/8)/8) 


with probability at least 1 — 8. In fact, one can give a slightly tighter analysis where the 
accuracy depends on the size of the sets S, T—by observing that the number of random 
variables participating in a cut query (S, T) is exactly |S||T|, one can show that the 
accuracy for all cuts is whp O(e~1,/|V]|S||T). 

Viewing the noisy database Dasa weighted graph â, where the weight of (u, v) 
is Luwjeg(a) + Lap(1/e), note that G has negative weight edges and hence cannot 
be considered synthetic data. We can remedy the situation (using the idea of solving a 
suitable linear program [2] [8]): 


Lemma 2 (Synthetic Data for Cuts). There is a computationally efficient (€, 0)-diff- 
erentially private randomized algorithm that takes a unweighted graph G and outputs 
a synthetic graph G" such that, with high probability, \||G — G’||o < O(\V|3/? /e)—all 
cuts in G and G' are within O(|V |°/? /e) additive error. 
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The proof is deferred to the full version, but the idea is straightforward: we write a 
linear program with exponentially many constraints to solve for a synthetic database, 
and use an SDP-based approximation algorithm of [LL] for the cut-norm problem as an 
approximate separation oracle to solve the LP. 


7 Towards Improving on Randomized Response for Synthetic Data 


In this section, we consider one possible avenue towards giving an efficient algorithm 
for privately generating synthetic data for graph cuts that improves over randomized 
response. We first show how generically, any efficient Iterative Database Construction 
algorithm can be used to give an efficient offline algorithm for privately releasing syn- 
thetic data when paired with an efficient distinguisher. The analysis here follows the 
analysis of [12], who analyzed the corresponding algorithm when instantiated with the 
multiplicative weights algorithm, rather than a generic Iterative Database Construction 
algorithm. 

We will pair an Iterative Database Construction algorithm for a class of queries C 
with a corresponding distinguisher. 


Definition 6 ((F (e), y)-Private Distinguisher). Let Q be a set of queries, let y > 0 
and let F(e) : Rt —> Z be a function. An algorithm Distinguish, : NI¥| x N'*! + Q is 
an (F (e), y)-Private Distinguisher for Q if for every setting of the privacy parameter 
e, it is €-differentially private with respect to D and if for every D, D' € NI¥! it outputs 
a Q* € Q such that |Q*(D) — Q*(D’)| > maxgeg |Q(DP) — Q(D’)| — F (e) with 
probability at least 1 — y. 


We present the algorithm in the full version, but the idea is very simple. Rather than 
waiting for a query to arrive online that induces an update step, we find queries which 
will induce update steps using the distinguisher. The IDC algorithm will guarantee that 
there will not be too many update steps, and so an efficient distinguisher will yield an 
efficient algorithm for releasing synthetic data. 


Theorem 6. There is an (€, 6)-differentially private mechanism for releasing synthetic 
data such that given an (F (e€), y)-private distinguisher and a B(a)-IDC, it is (a, 3)- 
accurate for: 


qa > max 


16,/B(a) log (179) log (2B(a)/8) , ,, ( c ) 


4,/ B(a) log(1/6) 
as long as y < B/(2B(a)) 


We defer the proof until the full version. Note that the running time of the algorithm 
is dominated by the running time of the IDC algorithm and of the distinguishing al- 
gorithm: efficient IDC algorithms paired with efficient distinguishing algorithms for a 
class of queries Q automatically correspond to efficient algorithms for privately releas- 
ing synthetic data useful for Q. For the class of graph cut queries, both the multiplica- 
tive weights IDC and the Frieze/Kannan IDC are computationally efficient. Therefore, 
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one approach to finding a computationally efficient algorithm for releasing synthetic 
data useful for cut queries is to find an efficient private distinguisher for cut queries. 

One curious aspect of this approach is that it might in fact be computationally easier 
to release a larger class of queries than cut queries, even though this is a strictly more 
difficult task from an information theoretic perspective. For example, solving the distin- 
guishing problem for cut queries on graphs D and D’ is equivalent to finding a pair of 
sets (S, T) which witness the cut-norm on the graph D — D’. On the other hand, solv- 
ing the distinguishing problem for rank-1 queries (which include cut queries, and are a 
larger class) is equivalent to finding the best rank-1 approximation to the adjacency ma- 
trix D — D'. The former problem is NP-hard, whereas the latter problem can be quickly 
solved non-privately using the singular value decomposition. 


Corollary 2. An efficient (F (e), y)-distinguisher for the class of rank-1 queries for 
F(e) = T/e would yield an (a, 3)-accurate mechanism for releasing synthetic data 
for graph cuts (and all rank-1 queries) for any 8 > Q(exp(—eT)) and: amw = 
2427 /2,/T m (log |V| log(1/6))!/4 using the multiplicative weights IDC, or: apg > 
2e—'/2 (m log(1/5))!/4 J V]T using the Frieze/Kannan IDC 


The proof, deferred to the full version, only requires plugging in the parameters for 
these two IDC algorithms. We remark that for the class of rank-1 queries, an efficient 
(F(e), y)-distinguisher with F (e) = O (4) would be sufficient to yield an efficient 
algorithm for releasing synthetic data useful for cut queries, with guarantees matching 
those of the best known algorithms for the interactive case, as listed in Table [I] For 
graphs for which the size of the edge set m < (2(n7), this would yield an improvement 
over our randomized response mechanism, which is the best mechanism currently for 
privately releasing synthetic data for graph cuts. We observe that such a distinguisher 
is information-theoretically possible, and the only question is whether such a private 
distinguisher exists that is also computationally efficient. To see this, observe that an 
O(|V|)-net for the set of all rank-1 queries can be constructed by considering all pairs 
of vectors x,y € {0,1/|V],2/|V],...,1}!”! and their associated outer-products x-y". 
Since there are at most |V|?!! such pairs, the exponential mechanism serves as an 
inefficient F (e) distinguisher for F (e€) = O(|V| log |V|/e). 

We note that a distinguisher for rank-1 queries must simply give a good rank-1 ap- 
proximation to the matrix D — D’, which will always be symmetric in this setting (be- 
cause both the hypothesis is at every step simply the adjacency matrix for an undirected 
graph, as of course is the private database), and hence an algorithm for finding accurate 
rank-1 approximations merely for symmetric matrices would already yield an algorithm 
for releasing synthetic data for cuts! Unlike classes of queries like conjunctions, for 
which their are imposing barriers to privately outputting useful synthetic data [21] [12], 
there are as far as we know no such barriers to improving our randomized-response 
based results for synthetic data for graph cuts. We leave finding such an algorithm, for 
privately giving low rank approximations to matrices, as an intriguing open problem. 
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A Other Iterative Database Construction Algorithms 


In this section, we demonstrate how the median mechanism and the multiplicative 
weights mechanism fit into the IDC framework. These mechanisms apply to general 
classes of linear queries Q. 


A.1 The Median Mechanism 


In this section, we show how to use the median database subroutine as an Iterative 
Database Construction. 


Definition 7 (Median Datastructure). A median datastructure D is a collection of 
databases D C N'*!, Any query can be evaluated on a median datastructure as follows: 
Q(D) = Median({Q(D’) : D’ € D}). 


Algorithm 3. The Median Mechanism (MM) Algorithm 

UR (D*,Q@, A) 
If: Dt = ý then: output D° = {D € NI~! : |D| = n? logk/a?} 
Else if: Q® (D+!) — A® > 0 then: output D’ = D’\ {D € D : QH (D) > QP (D)} 
Else if: Q® (D*) — A < 0 then: output D’ = D' \ {D € D: QË (D) < Q™(D)} 


Theorem 7. The Median Mechanism algorithm is a B(a) = n log |X| log k/a? iter- 
ative database construction algorithm for every class of k linear queries Q. 


Proof. Let D € N!*! be any database and consider a (UMM , D*, Q, a, B)-database 
update sequence, {(D‘, Qe), Ay} . It will be sufficient if we can show that 


t=1,...,.B 
B(a) < n?log|#|logk/a?. Specifically, that after n? log || log k/a? invocations 
of Us the median datastructure D” !°8|*| log k/a? jg (a, Q)-accurate for D. The 
argument is simple. First, we have a simple fact from [5]: 


Claim. For any set of k linear queries Q and any database D of size n, there is a 
database D’ of size |D’| = n? log k/a? so that D’ is w-accurate for D with respect to Q. 


From this claim, we have that [Dt] > 1 for all t, and so can always be used to evaluate 
queries. On the other hand, each update step eliminates half of the databases in the 
median datastructure: |D*| = |D‘~1|/2. This is because the update step eliminates 
every database either above or below the median with respect to the last query. Initially 
|D°| = |x|?’ !8*/" and so there can be at most B(a) < log n?|A’| log k/a? update 
steps before we would have |D?| < 1, a contradiction. 


A.2 The Multiplicative Weights Mechanism 


In this section we show how to use the multiplicative weights subroutine as an Itera- 
tive Database Construction. The analysis of the multiplicative weights algorithm is not 
new, and follows [15]. It will be convenient to think of our databases in this section as 
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probability distributions, i.e. normalized so that ||D||, = 1. Note that if we are a/n 
accurate for the normalized database, we are a-accurate for the un-normalized database 
with respect to any set of linear queries. 


Algorithm 4. The Multiplicative Weights (MW) Algorithm 

Uy (Dt, Q®, AM): 
Let n + a/(2n). 
If: Dt = É then: output D’ = D € R'*! such that D? = 1/|&| for all i. 
if A < QM (Dt) then 


Let Tt = Q® 
else 

Let r; =1-Q® 
end if 


Update: For all i € [|X|] Let 


Di? = exp(—nri(D!)) - D! 


Output D‘*?. 


Theorem 8. The Multiplicative Weights algorithm is a B(a) = 4n? log |X|/a? itera- 
tive database construction algorithm for every class of linear queries Q. 


Proof. Let D € NIY! be any database and consider a (UMW , D*, Q, a, B)-database 
update sequence, {(D®, Qo, Aw) elt will be sufficient if we can show that 
=1, 


yia 


B(a) < 4n? log Je] /o*. Specifically, that ae 4n? log |X| /a? invocations of UMW, 
the database D4” es |4'1/2") ig (a, Q)-accurate for D. First let D € RI*! be a nor- 
malization of the database D: D; = D;/||D]|. Note that for any linear query, Q(D) = 
n - Q(D). We define: 


x| 
def p 
Y= D(D||D*) SAn log (3) 


We begin with a simple fact: 
Claim ({T3\). For all t: % > 0, and Yo < log ||. 


We will argue that in every step for which |Q® (D) — QM(D*)| > a/n the potential 
drops by at least a? /4n. Because the potential begins at log |X|, and must always be 
non-negative, we know that there can be at most B(a) < 4n? log |X|/a? steps before 
the algorithm outputs a database D* such that maxgeg |Q(D)— Q(D")| < a/n, which 
is exactly the condition that we want. 
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Lemma 3 ({15]) 
Vi — V1 > (rD) — re(D)) — n? 


Proof 
|x| t+1 
D; 
— Vii = dP log Cy =) 
|x| 
= —nri(D) — log | X` exp(—nri(«i))D} 

i=1 
|x| 

—nr(D) — log | XO DEQ + n? = nr(2:)) 
i=1 


> n (re(D*) — ra(D)) -7° 


The rest of the proof now follows easily. By the conditions of an iterative p y con- 
struction algorithm, |A“ — QH (D)| < a/(2n). Hence, for each t such that |Q® (D) — 
QM(D*)| > a/n, we also have that Q®(D) > QM(D,) if and only if A® > 
Q® (D;). In particular, r, = Q® if Q® (D+) — QM(D) > a/n, and r; = 1 — Q® if 
QM (D) — QH (Dt) > a/n. Therefore, by LemmaBJand the fact that n = a/2n: 
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Abstract. Unlike the standard notion of pseudorandom functions 
(PRF), a non-adaptive PRF is only required to be indistinguishable from 
random in the eyes of a non-adaptive distinguisher (i.e., one that pre- 
pares its oracle calls in advance). A recent line of research has studied the 
possibility of a direct construction of adaptive PRFs from non-adaptive 
ones, where direct means that the constructed adaptive PRF uses only 
few (ideally, constant number of) calls to the underlying non-adaptive 
PRF. Unfortunately, this study has only yielded negative results, show- 
ing that “natural” such constructions are unlikely to exist (e.g., 
[EUROCRYPT 704], [Pietrzak] [CRYPTO ’05, EUROCRYPT ’06)). 

We give an affirmative answer to the above question, presenting a 
direct construction of adaptive PRFs from non-adaptive ones. Our con- 
struction is extremely simple, a composition of the non-adaptive PRF 
with an appropriate pairwise independent hash function. 


1 Introduction 


A pseudorandom function family (PRF), introduced by Goldreich, Goldwasser, 
and Micali i, cannot be distinguished from a family of truly random functions 
by an efficient distinguisher who is given an oracle access to a random member 
of the family. PRFs have an extremely important role in cryptography, allowing 
parties, which share a common secret key, to send secure messages, identify them- 
selves and to authenticate messages idid . In addition, they have many other 
applications, essentially in any setting that requires random function provided 
as black-box |2 (a) g, la, ATA . Different PRF constructions are known in the 
literature, whose security is based on different hardness assumption. Construc- 
tions relevant to this work are those based on the existence of pseudorandom 
generators |11] (and thus on the existence of one-way functions ad), and on, 
the so called, synthesizers [1 

In this work we study ihe question of constructing (adaptive) PRFs from 
non-adaptive PRFs. The latter primitive is a (weaker) variant of the standard 
PRF we mentioned above, whose security is only guaranteed to hold against 
non-adaptive distinguishers (i.e., ones that “write” all their queries before the 
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first oracle call). Since a non-adaptive PRF can be easily cast as a pseudorandom 
generator or as a synthesizer, ful tell us how to construct (adaptive) PRF 
from a non-adaptive one. In both of these constructions, however, the resulting 
(adaptive) PRF makes O(n) calls to the underlying non-adaptive PRF (where 
n being the input length of the functions) [ 

A recent line of work has tried to figure out whether more efficient reductions 
from adaptive to non-adaptive PRF’s are likely to exist. In a sequence of works 
[16 ug 5), it was shown that several “natural” approaches (e.g., composition 
or XORing members of the non-adaptive family with itself) are unlikely to work. 
See more in Section L3. 


1.1 Our Result 


We show that a simple composition of a non-adaptive PRF with an appropriate 
pairwise independent hash function, yields an adaptive PREF. To state our result 
more formally, we use the following definitions: a function family F is T = 
T(n)-adaptive PRF, if no distinguisher of running time at most T, can tell a 
random member of F from a random function with advantage larger than 1/T. 
The family F is T-non-adaptive PRF, if the above is only guarantee to hold 
against non-adaptive distinguishers. Given two function families Fı and F2, we 
let Fy o Fa [resp., Fı QB Fə] be the function family whose members are all pairs 
(f,9) © Fix Fe, and the action (f, g)(x) is defined as f(g(x)) [resp., f(2)@g(x)]. 
We prove the following statements (see Section [3 for the formal statements). 


Theorem 1 (Informal). Let F be a (p(n)-T(n))-non-adaptive PRF, where 
p E€ poly is function of the evaluating time of F, and let H be an efficient 
pairwise-independent function family mapping strings of length n to [T'(n)]so,1}», 
where [T],o,1}» is the first T elements (in lexicographic order) of {0,1}". Then 


FoH isa (/T@)/2) -adaptive PRF. 


For instance, assuming that F is a (p(n) - 2°")-non-adaptive PRF and that H 
maps strings of length n to [2] ,0,1}", Theorem|]] yields that FoH isa (2-1). 
adaptive PRF. 

Theorem [I] is only useful, however, for polynomial-time computable T’s (in 
this case, the family H assumed by the theorem exists, see Section 22). Un- 
fortunately, in the important case where F is only assumed to be polynomially 
secure non-adaptive PRF, no useful polynomial-time computable T is guaran- 
teed to exists Ë] 

We suggest two different solutions for handling polynomially secure PRFs. 
In Section |4 we observe (following Bellare i) that a polynomially secure non- 
adaptive PRF is a T-non-adaptive PRF for some T € n“), Since this T can 


1 We remark that if one is only interested in polynomial security (i.e., no adaptive 
PPT distinguishes with more than negligible probability), then w(logn) calls are 
sufficient (cf., E! Sec. 3.8.4, Exe. 30]). 

2 Clearly F is p-non-adaptive PRF for any p € poly, but applying Theorem |1| with 
T € poly, does not yield a polynomially secure adaptive PRF. 
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be assumed without loss of generality to be a power of two, Theorem [i] yields a 
non-uniform (uses n-bit advice) polynomially secure adaptive PRF, that makes 
a single call to the underlying non-adaptive PRF. Our second solution is to use 
the following “combiner” , to construct a (uniform) adaptively secure PRF, which 
makes w(1) parallel calls to the underlying non-adaptive PRF. 


Corollary 1 (Informal). Let F be a polynomially secure non-adaptive PRF, 
let H = {Hn jnen be an efficient pairwise-independent length-preserving function 
family and let k(n) € w(1) be polynomial-time computable function. 


Forn € N andi € |n], let Ha be the function family H, = fh: he H}, where 
h(x) = 0" "||A(x)1 


gaeay 


1.2 Proof Idea 


To prove Theorem [I] we first show that F o H is indistinguishable from H o H, 
where JT being the set of all functions from {0,1}" to {0,1} (letting ¢(n) 
be F’s output length), and then conclude the proof by showing that I o H is 
indistinguishable from JT. 


F oH Is indistinguishable from H oH. Let D be (a possibly adaptive) al- 
gorithm of running time T(n), which distinguishes F o H from H oH with 
advantage e(n). We use D to build a non-adaptive distinguisher D of running 
time p(n) - T(n), which distinguishes F from IT with advantage e(n). Given 
an oracle access to a function ¢, the distinguisher D*(1”) first queries ¢ on 
all the elements of [T'(n)] 40,17. Next it chooses at uniform h € H, and uses 
the stored answers to its queries, to emulate D®°”"(1"). 

Since D runs in time p(n) - T(n), for some large enough p € poly, makes 
non-adaptive queries, and distinguishes F from J with advantage e(n), the 
assumed security of F yields that e(n) < AT 

I oH Is indistinguishable from I. We prove that H o H is statistically in- 
distinguishable from IT. Namely, even an unbounded distinguisher (that 
makes bounded number of calls) cannot distinguish between the families. 
The idea of the proof is fairly simple. Let D be an s-query algorithm trying 
to distinguish between JH oH and IJ. We first note that the distinguish- 
ing advantage of D is bounded by its probability of finding a collision in a 
random ¢ € H oH (in case no collision occurs, ¢’s output is uniform). We 
next argue that in order to find a collision in ¢, the distinguisher D gains 
nothing from being adaptive. Indeed, assuming that D found no collision 
until the tth call, then it has only learned that h does not collide on these 
first i queries. Therefore, a random (or even a constant) query as the (i+ 1) 
call, has the same chance to yield a collision, as any other query has. Hence, 
we assume without loss of generality that D is non-adaptive, and use the 
pairwise independence of to conclude that D’s probability in finding a 
collision, and thus its distinguishing advantage, is bounded by s(n)?/T(n). 
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Combining the above two observations, we conclude that an adaptive dis- 
tinguisher whose running time is bounded by 4$%/T(n), cannot distinguish 
F oH from I (i.e., from a random function) with an advantage better than 


2 
T nay < 2/\/T0). Namely, F oH is a ( </T(n)/2)-adaptive PRF. 


1.3 Related Work 


Maurer and Pietrzak were the first to consider the question of building 
adaptive PRFs from non-adaptive ones. They showed that in the information 
theoretic model, a self composition of a non-adaptive PRF does yield an adaptive 
PRFP] 

In contrast, the situation in the computational model (which we consider here) 
seems very different: Myers proved that it is impossible to reprove the result 
of via fully-black-box reductions. Pietrzak showed that under the Deci- 
sional Diffie-Hellman (DDH) assumption, composition does not imply adaptive 
security. Where in he showed that the existence of non-adaptive PRFs whose 
composition is not adaptively secure, yields that key-agreement protocol exists. 
Finally, Cho et al. E) generalized i20) by proving that composition of two non- 
adaptive PRFs is not adaptively secure, iff (uniform transcript) key agreement 
protocol exists. We mention that ld, ig, [5], and in a sense also if) hold also 
with respect to XORing of the non-adaptive families. 


2 Preliminaries 


2.1 Notations 


All logarithms considered here are in base two. We let ‘||’ denote string con- 
catenation. We use calligraphic letters to denote sets, uppercase for random 
variables, and lowercase for values. For an integer t, we let [t] = {1,...,¢}, and 
for a set S C {0,1}* with |S| > t, we let [t]s be the first t elements (in in- 
creasing lexicographic order) of S. A function u: N > (0, 1] is negligible, denoted 
u(n) = neg(n), if u(n) = n-¥, We let poly denote the set all polynomials, and 
let PPT denote the set of probabilistic algorithms (i.e., Turing machines) that 
run in strictly polynomial time. 

Given a random variable X, we write X(x) to denote Pr[X = zx], and write 
x + X to indicate that x is selected according to X. Similarly, given a fi- 
nite set S, we let s + S denote that s is selected according to the uniform 
distribution on S. The statistical distance of two distributions P and Q over 
a finite set U, denoted as SD(P,Q), is defined as maxscy |P(S) — Q(S)| = 


3 Ducu |P(u) — Q(u)I.- 


3 Specifically, assuming that the non-adaptive PRF is (Q,€)-non-adaptively secure, no 
Q-query non-adaptive algorithm distinguishes it from random with advantage larger 
than g, then the resulting PRF is (Q, ¢(1 + In +))-adaptively secure. 
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2.2 Ensemble of Function Families 


Let F = {Fn: Dn Rn}nen stands for an ensemble of function families, where 
each f € Fn has domain D, and its range contained in Rn. Such ensemble is 
length preserving, if Dn = Rn = {0,1}” for every n. 


Definition 1 (efficient function family ensembles). A function family en- 
semble F = {Fn}nen is efficient, if the following hold: 


Samplable. F is samplable in polynomial-time: there exists a PPT that given 
1", outputs (the description of) a uniform element in Fn. 

Efficient. There exists a polynomial-time algorithm that given x € {0,1}" and 
(a description of) f € Fn, outputs f(x). 


Operating on Function Families 


Definition 2 (composition of function families). Let F! = {F}: Dl > 
Ri nen and F? = {F2: D? 4 R2}nen be two ensembles of function families 
with Ri C D? for every n. We define the composition of F! with F? as F? o 
Fl = {Fro Fa: Dy > Ra tnen, where Fro Fy = {(far fi) € Fa x Fr}, and 
(fa, fi)(@) = fol fil@)). 


Definition 3 (XOR. of function families). Let F! = {F}: D] œ> Riknen 
and F? = {F?: D? + R2}nen be two ensembles of function families with 
RI, RZ C {0,1} for every n. We define the XOR of F! with F? as 
F QF! ={F2 QF}: DINOD? & {0,1} en, where FZO Fi = { (fe, fi) € 
Fa X Fry, and (fo, fr)(x) = fo(x) @ fila). 


Pairwise Independent Hashing 


Definition 4 (pairwise independent families). A function family H = 
{h: D > R} is pairwise independent (with respect to D and R), if 
1 


Prawn [h(21) = yı A h(a2) = y2] = RE’ 


for every distinct x1, x2 E D and every y1,y2 E R. 


For every £ € poly, the existence of efficient pairwise-independent family en- 
sembles mapping strings of length n to strings of length @(n) is well known 
(|4]). In this paper we use efficient pairwise-independent function family en- 
sembles mapping strings of length n to the set [T’(n)]ro,1;», where T(n) < 2” 
and is without loss of generality a power of twol‘] Let H be an efficient length- 
preserving, pairwise-independent function family ensemble and assume that 
t(n) := logT(n) is polynomial-time computable. Then the function family 


jase, 


independent function family ensemble, mapping strings of length n to the set 
[T(n)] {0,1}. 


t For our applications, see Section El we can always consider T’(n) = gllog(T(n))] 
which only causes us a factor of two loss in the resulting security. 
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Pseudorandom Functions 


Definition 5 (pseudorandom functions). An efficient function family en- 
semble F = {Fn: {0,1}" + {0,1} } nen is a (T(n), e(n))-adaptive PRF, if for 
every oracle-aided algorithm (distinguisher) D of running time T(n) and large 
enough n, it holds that 


[Prj sr, [D (1") = 1] — Pree, [D*(1") = 1]| < e(n), 


where IT, is the set of all functions from {0,1}" to {0,1}. If we limit D above 
to be non-adaptive (i.e., it has to write all his oracle calls before making the first 
call), then F is called (T(n), €(m))-non-adaptive PRF. 

The ensemble F is at-adaptive PRF, if it is a (t,1/t)-adaptive PRF according 
to the above definition. It is polynomially secure adaptive PRF (for short, adaptive 
PRE), if it is a p-adaptive PRF for every p € poly. Finally, it is super-polynomial 
secure adaptive PRF, if it T-adaptive PRF for some T(n) € n®“). The same 
conventions are also used for non-adaptive PRFs. 


Clearly, a super-polynomial secure PRF is also polynomially secure. In Section [d 
we prove that the converse is also true: a polynomially secure PRF is also super- 
polynomial secure PRF. 


3 Our Construction 


In this section we present the main contribution of this paper — a direct con- 
struction of an adaptive pseudorandom function family from a non-adaptive one. 
Theorem 2 (restatement of Theorem Bh. Let T be a polynomial-time com- 
putable integer function, let H = {Hn: {0,1}" > [T(n)]{o,1}»} be an efficient 
pairwise independent function family ensemble, and let F = {F,: {0,1}" => 
{0,1}4™} be a (p(n) - T(n),e(n))-non-adaptive PRF, where p € poly is 
determined by the computation time of T, F and H. Then FoH is a 


(s(n),<(n) + 2) adaptive PRF for every s(n) < T(n). 


Theorem [J yields the following simpler statement. 
Corollary 2. Let T, p and H be as in Theorem|g. Assuming F is a (p(n)T(n))- 
non-adaptive PRF, then FoH is a (/T@)/2) -adaptive PRF. 


a Applying Theorem [2 with respect to s(n) = WT (n)/2 and e(n) = 


OO yields that Fo H is a (s(n); as Par z) adaptive PRF. P 


AT < Tain) and 5 ths < Tan) y» it follows that F oH is a (s, 1/s)-adaptive 
PRF. 


To prove Theorem B, we use the (non efficient) function family ensemble H o H, 
where JT = Hp (i.e., the ensemble of all functions from {0,1}” to {0,1}*), and 
é = L(n) is the output length of F. We first show that F o H is computationally 
indistinguishable from H oH, and complete the proof showing that IT oH is 
statistically indistinguishable from JT. 
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3.1 FoH Is Computationally Indistinguishable From IT oH 


Lemma 1. Let T, F and H be as in Theorem J. Then for every oracle-aided 
distinguisher D of running time T, there exists a non-adaptive oracle-aided dis- 
tinguisher D of running time p(n) - T(n), for some p € poly (determined by the 
computation time of T, F and H), with 


Pree, D= 1] — Pre, D= 1]| = 
[Pree Fno%n[D9(1") = 1] — Proem,ox,[D9(1") = 1]| 


for every n € N, where II, is the set of all functions from {0,1}" to {0,1}4). 


In particular, the pseudorandomness of F yields that F o H is computationally 
indistinguishable from the ensemble {In 0 Hn }nen by an adaptive distinguisher 
of running time T. 


Proof. The distinguisher D is defined as follows: 
Algorithm 3 (D) 


Input: 1”. 
Oracle: a function ġ over {0,1}”. 


1. Compute (x) for every x € [T(n)]{o,1}»- 

2. Setg=ooh, where h is uniformly chosen in Hy. 

3. Emulate D9(1"): answer a query x to o made by D with g(x), using the 
information obtained in Step U} 


Note that D makes T (n) non-adaptive queries to ¢, and it can be implemented 
to run in time p(n)T(n), for large enough p € poly. We conclude the proof by 
observing that in case ¢ is uniformly drawn from Fn, the emulation of D done 
in D? is identical to a random execution of DI with g + Fn o Hn. Similarly, 
in case ¢ is uniformly drawn from In, the emulation is identical to a random 
execution of D” with m + Hn. 


3.2 IToH Is Statistically Indistinguishable From IT 


The following lemma is commonly used for proving the security of hash based 
MACs (cf., [9, Proposition 6.3.6]), yet for completeness we give it a full proof 
below. 


Lemma 2. Let n,T be integers with T < 2”, and let H be a pairwise- 
independent function family mapping string of length n to [T]to,1;». Let D be 
an (unbounded) s-query oracle-aided algorithm (i.e., making at most s queries), 
then 

[Pr g nou [D? = 1] — Pr ren [D7 = 1]| < 87/7, 


where II is the set of all functions from {0,1} to {0,1}* (for some £ € N). 
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Proof. We assume for simplicity that D is deterministic (the reduction to the 
randomized case is standard) and makes exactly s valid (i.e., inside {0,1}”) 
distinct queries, and let 2 = ({0,1}*)*. Consider the following random process: 


Algorithm 4 


1. Emulate D, while answering the i’th query qi with a uniformly chosen a; € 
{0,1}. 

Set G = (u,---, ds) and @ = (a1,..., as). 

2. Choose h + H. 

3. Emulate D again, while answering the i’th query q; with al, = a; (the same 
ai from Stepi}, if hldi) ¢ {Ala;)} jeli]; and with aj, = aj, if h(a) = h(a) 
for some j € fi — 1]. 

Set q = (qi, -.-,q,) and d = (al, +2240.) 


Let A, Q, A’, Q' and H be the (jointly distributed) random variables induced by 
the values of 7, a, q’, a’ and h respectively, in a random execution of the above 
process. It is not hard to verify that A is distributed the same as the oracle 
answers in a random execution of D” with r + I, and that A’ is distributed 
the same as the oracle answers in a random execution of DI with g + HoH. 
Hence, for proving Lemma 2, it suffices to bound the statistical distance between 
A and A’. 

Let Coll be the event that H(Q;) = H(Q,;) for some i 4 j € [s]. Since the 
queries and answers in both emulations of Algorithm [4] are the same until a 
collision with respect to H occurs, it follows that 


Pr[A Æ A’] < Pr[Coll] (1) 


On the other hand, since H is chosen after Q is set, the pairwise independent 
of H yields that 


Pr[Coll] < s?/T, (2) 


and therefore Pr[A # A’] < s?/T. It follows that Pr[A € C] < Pr[A’ € C]+s?/T 
for every C C 2, yielding that SD(A, A’) < s?/T. 


3.3 Putting It Together 
We are now finally ready to prove Theorem B. 


Proof (of Theorem B). Let D be an oracle-aided algo- 
rithm of running time s with s(n) < T(n). Lemma 


yields that |[PrgFnoHn [D1 (1”) = 1] — Prget,oHn P = 1]| Š 
e(n) for large enough n, where Lemma yields that 
|Pr gn ohn [D9(1") = 1] — Pr rer, [D"(1”) = 1] < s(n)?/T(n) 


for every n € N. Hence, the triangle inequality yields that 
[Prge-7,0%, [D9(1") = 1] — Pree, [D™(1") =] < eln) + 8(n)?/T(n) for 
large enough n, as requested. 


From Non-adaptive to Adaptive Pseudorandom Functions 365 


3.4 Handling Polynomial Security 


Corollary 8 is only useful when the security of the underlying non-adaptive PRF 
(i.e., T) is efficiently computable (or when considering non-uniform PRF con- 
structions, see Section . In this section we show how to handle the important 
case of polynomially secure non-adaptive PRF. We use the following “combiner” . 


Definition 6. Let H be a function family into {0,1}”". For i € |n], let Hi be the 
function family H! = {h: h E€ H}, where h(x) =0"~*||h(x)1,...- 


Corollary 3. Let F be a T(n)-non-adaptive PRF, let H be an efficient length- 
preserving pairwise-independent function family ensemble, and let T(n) C [n] be 
polynomial-time computable (in n) index set. Define the function family ensemble 
G = {Gr}nen, where Gn = Diez(n) (a o H | 

There exists q € poly such that G is a (v 21) /2) -adaptive PRF, for every 
polynomial-time computable integer function t, with t(n) € T(n) and 24) < 


T(n)/q(n). 


Before proving the corollary, let us first use it for constructing adaptive PRF 
from non-adaptive polynomially secure one. 


Corollary 4 (restatement of Corollary m. Let F be a polynomially secure 
non-adaptive PRF, let H be an efficient pairwise-independent length-preserving 
function family ensemble and let k(n) € w(1) be polynomial-time computable 

— lilogn] 


function. Then G := {Die a(n)] (a o Hn 
adaptive PRF. 


Jen is polynomially secure 


Proof. Let T(n) := {|logn]| ,|2-logn]..., | k(n) - logn|}. Applying Corollary [8 
with respect to F, H, Z and t(n) = |c- logn], where c € N, yields that G is a 
O(\/n°)-adaptive PRF. It follows that G is p-adaptive PRF for every p € poly. 
Namely, G is polynomially secure adaptive PRF. 


Remark 1 (unknown security). Corollary [3] is also useful when the security of 
F is “not known” in the construction time. Taking Z(n) = {1,2,4,..., 210871} 
(resulting in logn calls to F) and assuming that F is found to be T(n)-non- 
adaptive PRF for some polynomial-time computable T, the resulting PRF is 
guaranteed to be O(</T(n))-adaptive PRF (neglecting polynomial factors). 


Proof (of Corollary|4). It is easy to see that G is efficient, so it is left to argue for 
its security. Let q(n) = q/(n)p(n), where p is as in the statement of Corollary B, 
and q’ € poly to be determined later. Let t be a polynomial-time computable 
integer function with t(n) € Z(n) and 2") < T(n)/q(n). It follows that Ht = 


—~t(n) ; . eee : : 
{Hn á nen is an efficient pairwise-independent function family ensemble, and 


Corollary [J yields that F o Ht is a (Vv J (m2) /2) -adaptive PRF. 
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Assume towards a contradiction that there exists an oracle-aided distinguisher 
D that runs in time T’(n) = V2") /2 and 


[Proc [D9(1") = 1] — Pree, [D*(1") = 1] > 1/T"(n) (3) 


for infinitely many n’s. We use the following distinguisher for breaking the pseu- 
dorandomness of F o Ht: 


Algorithm 5 (D) 


Input: 1”. 
Oracle: a function d over {0,1}”. 


1. For every i € T(n) \ {t(n)}, choose g + Fno Hn. 


2. Set g := $ D Dict un} I 
3. Emulate D9(1”). 


Note that D can be implemented to run in time |Z(n)|-r(n)-T"(n) for some r € 
poly, which is smaller than ~/q/(n)2*(™ /2 for large enough q’. Also note that in 


case ¢ is uniformly distributed over Jn, then g (selected by D*(1")) is uniformly 
— t(n 
distributed in Jn, where in case ¢ is uniformly distributed in Fn © Hn a then 


g is uniformly distributed in Gn. It follows that 
|P [D9(1") = 1] — Prae n, [D7 (1%) = 1)| = 
[Prga [DI (1") = 1] — Prawn, [D™(1") = lJ] (4) 


lge (Foft)n 


for every n € N. In particular, Equation (3) yields that 


2 2 
> F 
St(n) 3 qd (n) 


Pye (roi, D0) = 1] = Prr, D70") = 1| > 


for infinitely many n’s, in contradiction to the pseudorandomness of F o Ht we 
proved above. 


Acknowledgment. We are very grateful to Omer Reingold for very useful 
discussions, and for challenging the second author with this research question a 
long while ago. 
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From Polynomial to Super-Polynomial Security 


The standard security definition for cryptographic primitives is polynomial se- 
curity: any PPT trying to break the primitive has only negligible success proba- 
bility. Bellare [1| showed that for any polynomially secure primitive there exists 
a single negligible function u, such that no PPT can break the primitive with 
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probability larger than u. Here we take his approach a step further, showing that 
for a polynomially secure primitive there exists a super-polynomial function T, 
such that no adversary of running time T breaks the primitive with probability 
larger than 1/T. 

In the following we identify algorithms with their string description. In par- 
ticular, when considering algorithm A, we mean the algorithm defined by the 
string A (according to some canonical representation). We prove the following 
result. 


Theorem 6. Let v: {0,1}* x N > [0,1] be a function with the following prop- 
erties: 1) v(A,n) < 1/p(n) for every oracle-aided PPT A, p € poly and large 
enough n; and 2) if the distributions induced by random executions of Af (x) and 
B(x) are the same for any input x € {0,1}" and function f (each distribution 
describes the algorithm’s output and oracle queries), then v(A, n) = v(B,n). 

Then there exists an integer function T(n) € n® such that following holds: 
for any algorithm A of running time at most T(n), it holds that v(A, n) < 1/T(n) 
for large enough n. 


Remark 2 (Applications). Let f be a polynomially secure OWF_ (i.e., 
Pr[A(f(Un)) € f-1(f(Un))] = neg(n) for any PPT A). Applying Theorem|6 with 
v(A,n) := Pr[A(f(Un)) € f~'(f(Un))] (where if A expects to get an oracle, pro- 
vide him with the constant function ¢(a) = 1), yields that f is super-polynomial 
secure OWF (i.e., exists T(n) € n®™ such that Pr[A(f(Un)) € f7'(f(Un))] < 
1/T(n) for any algorithm of running time T and large enough n). 

Similarly, for a polynomially secure PRF F = {Fn }nen (see Definition 5), ap- 
plying Theorem|g with v(A, n) = [Prj F, [AF (1”) = 1] = Prr n, (AT) = 1]|, 
where Ihn is the set of all functions with the same domain/range as Fn, yields 
that F is super-polynomial secure PRF. 


Proof (of Theorem [d). Given a probabilistic algorithm A and an integer i, let A; 
denote the variant of A that on input of length n, halts after n’ steps (hence, 
A; is a PPT). Let S; be the first į strings in {0,1}*, according to some canonical 
order, viewed as descriptions of i algorithms. Let Z(n) = {i € [n]: VA € S;, k > 
n: v(Ai, k) < 1/k*} U {1}, let t(n) = maxZ(n) and T(n) = n() 

Let A be an algorithm of running time T(n), and let ia be the first integer 
such that A € S;,. In Claim [/] we prove that t(n) € w(1), hence it follows that 
t(n) > ia for any large enough n. For any such n, the definition of t guarantees 
that v(Ain),) < 1/n*™ = 1/T (n). Since A is of running time T(n), the second 
property of v yields that v(A, n) = v(Ayn),m), and therefore v(A, n) < 1/T(n). 


Claim 7. It holds that t(n) € w(1). 


Proof. Fix i € N. For each A € Sj, let na be the first integer such that v(A;, n) < 
1/n* for every n > na (note that such na exists by the first property of v), and 
let n; = max{na: A € S;}. It follows that v(A;, n) < 1/n* for every n > n; and 
A E€ §;, and therefore t(n;) > i. 
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Abstract. We show a hardness-preserving construction of a PRF from 
any length doubling PRG which improves upon known constructions 
whenever we can put a non-trivial upper bound q on the number of 
queries to the PRF. Our construction requires only O(log q) invocations 
to the underlying PRG with each query. In comparison, the number of 
invocations by the best previous hardness-preserving construction (GGM 
using Levin’s trick) is logarithmic in the hardness of the PRG. 

For example, starting from an exponentially secure PRG {0,1}" > 
{0, 1}2", we get a PRF which is exponentially secure if queried at most 
q = exp(,/n) times and where each invocation of the PRF requires 
O(n) queries to the underlying PRG. This is much less than the O(n) 
required by known constructions. 


1 Introduction 


In 1984, the notion of pseudorandom functions was introduced in the seminal 
work of Goldreich, Goldwasser and Micali [I0]. Informally speaking, a pseudo- 
random function (PRF) is a keyed function F : {0,1}” x {0,1}” — {0,1}”, 
such that no efficient oracle aided adversary can distinguish whether the oracle 
implements a uniformly random function, or is instantiated with F(k,.) for a 
random key k + {0,1}”. PRFs can be used to realize a shared random function, 
which has found many applications in cryptography [9[7[8/2]16/15{12). 

Goldreich et al. [IO] gave the first construction of a PRF from any length- 
doubling pseudorandom generator G : {0,1}" — {0,1}°"; this is known as the 
GGM construction. In this work, we revisit this classical result. Although we will 
state the security of all constructions considered in a precise quantitative way, 
it helps to think in asymptotic terms to see the qualitative differences between 
constructions. In the discussion below, we will therefore think of n as a parameter 
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(and assume the PRG G is defined for all input lengths n € N, and not just say 
n = 128). Moreover, for concreteness we assume that G is exponentially hard, 
that is, for some constant c > 0 and all sufficiently large n, no adversary of size 
2°" can distinguish G(U,,) from U2, (where U,, denotes a variable with uniform 
distribution over {0,1}") with advantage more than 27°”. We will also refer to 
this as “G having cn bits of security”. 

The GGM construction GGMg : {0,1}” x {0,1}” — {0,1}" is hardness 
preserving, which means that if the underlying PRG G has cn bits of security, 
it has c'n bits of security for some 0 < cœ < c. The domain size {0,1}” can 
be arbitrary, but the efficiency of the construction depends crucially on m as 
every invocation of GGMg requires m calls to the underlying PRG G. 

Levin proposed a modified construction which improves efficiency for long 
inputs: first hash the long m-bit input to a short u-bit string using a universal 
hash function h : {0,1} > {0,1}“, and only then use the GGM construction 
on this short u-bit string. The smaller a u we choose, the better the efficiency. 
If we just want to achieve security against polynomial size adversaries, then a 
super-logarithmic u = w(log n) will do. But if we care about exponential security 
and want this construction to be hardness preserving, then we must choose a 
u = Q(n) that is linear in n. Thus, the best hardness-preserving construction 
of a PRF FS from a length-doubling PRG G requires O(n) invocations of G for 
every query to F (unless the domain m = o(n) is sublinear, then we can use 
the basic GGM construction.) In this work we ask if one can improve upon this 
construction in terms of efficiency. We believe that in this generality, the answer 
actually is no, and state this explicitly as a conjecture. But our main result 
is a new construction which dramatically improves efficiency in many practical 
settings, namely, whenever we can put a bound on the number of queries the 
adversary can make. 

In the discussion above, we didn’t treat the number of queries an adversary can 
make as a parameter. Of course, the size of the adversary considered is an upper 
bound on the number of queries, but in many practical settings, the number of 
outputs an adversary can see is tiny compared to its computational resources. 

For example consider an adversary of size 2°” who can make only q = 2V” < 
2°" queries to the PRF. If the domain of the PRF is small, m = O(/n), then 
using GGM we get a hardness-preserving construction with efficiency O(./n) 
(where efficiency is measured by the number of queries to G per invocation of 
the PRF.) If we want a larger domain m = w(,/n), then the efficiency drops to 
m = w(/n). We can get efficiency O(n) regardless of how large m is by using 
Levin’s trick, but cannot go below that without sacrificing hardness preservation. 

In this paper we give a hardness-preserving construction which, for any in- 
put length m, achieves efficiency O(./n). The construction works also for other 
settings of the parameters. In particular, for q = 2” (note that above we con- 
sidered the case € = 1/2) we get a construction with efficiency O(log q) = O(n‘). 
Actually, this is only true for € > 1/2; whether there exists a hardness-preserving 
black-box construction with efficiency O(log q) for q = 2” where e < 1/2 is an 
interesting open question. 
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Other Applications. Although we described our result as an improved reduction 
from PRFs to PRGs, the main idea is more general. Viewing it differently, ours is 
a new technique for extending the domain of a PRF. If we apply our technique to 
PRFs with an input domain of length £ bits, Levin’s trick would require roughly 
a domain of size Z? to achieve a comparable quality of hardness preservation. 

This technique can be used to give more efficient constructions in other set- 
tings, for example to the work of Naor and Reingold [18] who construct PRFs 
computable by low depth circuits from so called pseudorandom synthesizer 
(PRS), which is an object stronger than a PRG, but weaker than a full blown 
PREF. Very briefly, [18] gives a hardness-preserving construction of a PRF from 
PRS which can be computed making O(n) queries to the PRS in depth O(log n) 
(GGM also makes O(n) queries, but sequentially, i.e. has depth O(n); on the 
other hand, GGM only needs a PRG, not a PRS as building block). Our domain 
extension technique can also be used to improve on the Naor-Reingold construc- 
tion, and improves efficiency from O(n) to O(log g) = O(n“) whenever one can 
put an upper bound q = 2” (e > 1/2) on the number of adversarial queries. 

Subsequent to [18], several number-theoretic constructions of PRFs have been 
proposed, inspired by the PRS based construction and GGM [19[20{74)5]7]. In 
particular, in [19], Naor and Reingold gave an efficient construction of a PRF 
from the DDH assumption that requires only n multiplications and one expo- 
nentiation instead of the n exponentiations required for GGM or the PRS based 
construction. This is achieved by exploiting particular properties of the underly- 
ing assumptions like the self reducibility of DDH. Our technique does not seem 
to be directly applicable to improve upon these constructions [19]. 


The Construction. Before we describe our construction in more detail, it is 
instructive to see why the universal hash-function h : {0,1}” — {0,1}" used for 
Levin’s trick must have range u = R(n) to be hardness-preserving. Consider any 
two queries x; and x; made by the adversary. If we have a collision h(#;) = h(a;) 
for the initial hashing, then the outputs GGM¢(k, h(a;)) = GGM¢(k, h(x;)) of 
the PRF will also collide. To get exponential security, we need this collision 
probability to be exponentially small. The probability for such a collision depends 
on the range u and is Pra[h(x;) = h(x,;)] = 27”. So we must choose u = O(n) 
to make this term exponentially small. 

Similar to Levin’s trick, we also use a hash function h : {0,1}™ — {0,1} 
to hash the input down to t = 3logq bits (Recall that q is an upper bound 
on the queries to the PRF, so if say q = 2Y”, then t = 3,/n.) As discussed 
earlier, the collision probability with such a short output length will not be 
exponentially small. However, we can prove something weaker, namely, if h is 
t-wise independent, then the probability that we have a t + 1-wise collision (i.e. 
any t+ 1 of the q inputs hash down to the same value.) is exponentially small. 

Next, the hashed value z; = h(x;) is used as input to the standard GGM PRF 
to compute zx} := GGMg¢(k, x). Note, however, that we can’t simply set x% as 
the output of our PRF because several of the inputs z1,..., £q can be mapped 
by h to the same 2’, and thus also the same 2”, which would not look random 
at all. 
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We solve this problem by using x) = GGMé(k,h(a;)) to sample a t-wise 
independent hash function h;. The final output z; := h;(a;) is then computed by 
hashing the original input x; using this h;. Note that with very high probability, 
for every i, at most t < t different a;,,...,2;,, will map to the same t-wise 
independent h;. Thus, the corresponding outputs h;(a;,),...,hi(vi,,) will be 
random. 

The invocation of the GGM construction and the sampling of h; from z? can 
both be done with O(t) invocations of G, thus we get an overall efficiency of 


O( vin). 


2 Preliminaries 


Variables, Sets and Sampling. By lowercase letters we denote values and bit 
strings, by uppercase letters we denote random variables and by uppercase calli- 
graphic letters we denote sets. Specifically, by Um we denote the random variable 
which takes values uniformly at random from the set of bit strings of length m 
and by Rm,n the set of all functions F : {0,1} + {0,1}”". If X is a set, then by 
Xt we denote the tth direct product of X, i.e., (41,..., X+) of t identical copies 
of X. If X is a random variable, then by X® we denote the random variable 
which consists of t independent copies of X. By x + X we denote the fact that 
x was chosen according to the random variable X and analogously by z «+ 4, 
that x was chosen uniformly at random from set X. 


Computational/Statisical Indistinguishability. For random variables Xo, X; dis- 
tributed over some set X, we write Xo ~ X, to denote that they are identically 
distributed, we write Xo ~s Xı to denote that they have statistical distance 6, 
ie. $ zex [Prx, [x] — Prx, [a]| < ô, and Xo ~(5,,) X1 to denote that they are 
(6, s) indistinguishable, i.e. for all distinguishers D of size at most |D| < s we 
have Xsex |Prx,|D(#) > 1] — Prx,[D(x) > 1]| < ô. In informal discussions we 
will also use ~, to denote statistical closeness (i.e. ~s for some “small” 6) and 
~c to denote computational indistinguishability (i.e. ~(5,s) for some “large” s 
and “small” 6.) 


3 Definitions 


We will need two information theoretic notions of hash functions, namely, ô- 
universal and t-wise independent hash functions. Informally, a hash function is 
t-wise independent if its output is uniform on any ¢ distinct inputs. A function 
is 6-universal if any two inputs collide with probability at most 6. 


Definition 1 (almost universal hash function). For 0,m,n € Z, a function 
h : {0,1} x {0,1}™ — {0,1} is 6-almost universal if for any x # x’ € {0,1}™ 


Prieto,r}e[he(a) = he(2')] < ô 
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Universal hash functions were studied in [62I], who also gave explicit construc- 
tions. 


Proposition 1. For any m,n there exists a 27”+1 -universal hash function with 
key length £ = 4(n + logm). Further, no such function can be d-universal for 
g< 2-0-1, 


Definition 2 (t-wise independent hash function family). For ¢,m,n,t € 
Z, a function h : {0,1} x {0,1} — {0,1}” is t-wise independent, if for every 
t distinct inputs z1,..., £4 € {0,1} and a random key k + {0,1}* the outputs 
are uniform, i.e. 


hac(21)}) -helee ~ UD 


Proposition 2. For anyt,m,n < m there exits a t-wise independent hash func- 
tion with key length €=m-t. 


Remark 1. Note that 2-wise independence implies 2~”-universality. The reason 
to consider the notion of 6-universality for 6 > 27” is that it can be achieved 
with keys of length linear in the output, as opposed to the input. 


Definition 3 (PRG[4[22]). A length-increasing function G : {0,1}" + {0,1}™ 
(m > n) is a (8, s)-hard pseudorandom generator if 


G(Un) ™ (6,8) Um 


We say G has o bits of security if G is (2-7, 2°)-hard. G is exponentially 
hard if it has cn bits of security for some c > 0, and G is sub-exponentially 
hard if it has cn! bits of security for some c > 0,€ > 0. 


The following lemma, which follows from a standard hybrid argument, will be 
useful. 


Lemma 1. Jf G: {0,1}" + {0,1} is a (6,s)-hard PRG of size |G| = s’, then 
for anyq EN 
G(Un)® ~(q5,s-4:5') ULD 


Definition 4 (PRF[IO]). A function F : {0,1} x {0,1}™ — {0,1}” is a 
(q,6,8)-hard pseudorandom function (PRF) if for every oracle aided distin- 
guisher D* of size |D*| < s making at most q oracle queries 


Pree qo pe (Dh) =>} 1] z Prrer,, p [DIO => 1] < ô 
F has o bits of security against q queries if F is (q,277,27) secure. 


If q is not explicitly specified, it is unbounded (the size 27 of the distinguisher 
considered is a trivial upper bound on q.) 
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3.1 The GGM Construction 


Goldreich, Goldwasser and Micali [10] gave the first construction of a PRF from 
any length doubling PRG. We describe their simple construction below. 

For a length-doubling function G : {0,1}” — {0,1}?" and m € N, let GGMc¢ : 
{0,1}” x {0,1} — {0,1}” denote the function 


GGMc¢(k, x) = kz where kz is recursively defined as ke = k and kayo||Kajj1 := G(ka) 


Proposition 3 ([10]). If G is a (6¢,8¢)-hard PRG, then for any m,q € N, 
GGM¢g : {0,1}” x {0,1} — {0,1}” is a (qg,6,s)-hard PRF where 


d=m-q- ôG s = sc—q-m- |G| (1) 


3.2 Levin’s Trick 


One invocation of the GGM construction GGMg : {0,1}” x {0,1} — {0,1}” 
requires m invocations of the underlying PRG G, so the efficiency of the PRF 
depends linearly on the input length m. Levin observed that the efficiency can 
be improved if one first hashes the input using a universal hash function. Using 
this trick one gets a PRF on long m-bit inputs at the cost of evaluating a PRF 
on “short” u bit inputs plus the cost of hashing the m-bit string down to u bits] 


Proposition 4 (Levin’s trick). Let h : {0,1} x {0,1}™ > {0,1} be a ôn- 
universal hash function and F : {0,1} x {0,1}” — {0,1}” be a (q, dr, s)-hard 
PRF, then the function F” : {0,1}4+" x {0,1} — {0,1}” defined as 


F"(ke||kn,2) := F(ke, h(kn, £)) 
is a (q,ô, s)-hard PRF where 


=q? -ôn + OF s = sF — q- |h] (2) 


3.3 Hardness Preserving and Good Constructions 


Definition 5 (Hardness Preserving Construction). A construction F* of 
a PRF from a PRG is hardness preserving up to q = q(n) queries, if for 
every constant c > 0,€ > 0 there is a constant c > 0 and n’ € Z such that for all 
n > n': if G is of polynomial size and has cn* bits of security, F has c'n® bits of 
security against q queries. It is hardness preserving if it is hardness preserving 
for any q. 

If the above holds for every ¢ < c, we say that it is strongly hardness pre- 
serving. 


As universal hash functions are non-cryptographic primitives, hashing is generally 
much cheaper than evaluating pseudorandom objects. 
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Proposition 5. The GGM construction is hardness preserving, more concretely 
(1) if G has cn* bits of security, GGMg has c'n® bits of security for any œ < c/2 
(2) GGM forq=n* , e < e queries is strongly hardness preserving. 


Proof. By eq.(i), if G has cn bits of security, then the GGM construction has 
min{cn® — log(q) — log(m), en — log |G| — log(m)} (3) 


bits of security. To see (1), we observe that for any c’ < c/2 eq.(3) is c’n* for 
sufficiently large n as required (using that m and |G| are polynomial in n and 
q = 2°.) To see (2) observe that for log(q) = n“ where e < e, the term eq.) 
is c’n® for sufficiently large n and every c’ < c. 


Recall that one invocation of GGM requires m invocations of the underlying 
PRG, where m must be at least [log(q)|. We conjecture that (2(log(q)) invoca- 
tions are necessary for any hardness preserving construction. 


Conjecture 1. Any construction FS(.,.) : {0, 1)" x {0,1} — {0,1}” that pre- 
serves hardness for q queries and has a black-box security proof must make 
Q (log q) invocation to G per invocation of FS. 


In the appendix we give some intuition as why we believe this conjecture holds. 
We show that the standard black-box security proof technique as used e.g. for 
GGM will not work for constructions making o(log q) invocations. 


Definition 6 ((Very) Good Construction). We call a construction as in 
Definition[5] good for q queries, if it is hardness preserving up to q queries and 
each invocation of FE results in O(log q) invocations of G. We call it very good, 
if it is even strongly hardness-preserving. m 


Thus, GGMg is good as long as the domain m is in O(log q), but not if we need 
a large domain m = w(log q). Let’s look at Levin’s construction. 


Proposition 6. The GGM construction with Levin’s trick GGMé (with h : 
{0,1} x {0,1} — {0,1}" as in Proposition [4) is hardness preserving if and 
only if u is linear in the security of the underlying PRG (e.g. u has to be linear 
in n if G is exponentially hard.) 


Proof. For concreteness, we assume G is exponentially hard, the proof is easily 
adapted to the general case. The number of queries to G per invocation of GGMi 
is u, where {0,1}“ is the range of the 6, universal hash function. By eq.(), 
GGME has security 6 = q? - dn + dccmg. To preserve exponential hardness, 6 
must be exponentially small. So also ô, < 6 must be exponentially small. By 
Proposition [] 6, > 2~“~1, thus u must be linear in n. 


2 We restrict the key length to n bits. This is not much of a restriction, as one can use 
G to expand the key. If we allow polynomially sized keys directly, then the conjecture 
would be wrong for polynomial q as the key could just contain the entire function 
table. 
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Summing up, the GGM construction is hardness preserving for any q, but only 
good if the domain is restricted to m = O(loggq) bits. By using Levin’s trick, 
we can get a hardness preserving construction where u = O(n) (if G is exponen- 
tially hard), but this will only be a good construction for q queries if q is also 
exponentially large. 

In a practical setting we often know that a potential adversary will never see 
more than, say 2V” outputs of a PRF FS. If we need a large domain for the 
PRF, and would like the construction to preserve the exponential hardness of 
the underlying PRG G, then the best we can do is to use GGM with Levin’s 
trick, which will invoke G a linear in n number of times with every query. Can we 
do better? If Conjecture [I] is true, then one needs O(,/n) invocations, which is 
much better than O(n). The main result in this paper is a construction matching 
this (conjectured) lower bound. 


4 Our Construction 


Let G : {0,1}” — {0,1}?” be a length doubling function. For e € N, we denote 
with G° : {0,1}”" — {0,1}°” the function that expands an n bit string to a 


Co C3 Ca = R3n,3n 


Ci 
t-3n ko al eH 
a! 


n r” x” gl! 
3t 3t 
G G Rn,t-3n 
t- 3n k 


i 


3n 


Fig. 1. The leftmost figure illustrates our construction C(.,.) using key k = ko||ki on 
input x. The numbers 3n,t- 3n,... on the left indicate the bit-length of the corre- 
sponding values x, ko,.... The remaining figures illustrate the games from the proof of 
Theorem [I] ¢ = 3log(q) is a parameter which depends on the number of queries q we 
allow. 
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en bits string using e — 1 invocations of G (this can be done sequentially, or in 
parallel in depth [log e].) We will use the following simple lemma which follows 
by a standard hybrid argument. 


Lemma 2. Let G be a (ô, s)-hard PRG, then G° is a (e- ô, s—e - |G|)-hard PRG. 


Further, our constructions uses a t-wise independent (cf. Proposition P) hash 


function. 
AAO eo 0 


Our construction CS : {0, 1}*3"+" x {0,1}8" > {0,1}8” of a PRF which will be 
good for large ranges of q, on input a key k = ko||ky (where ko € {0,1}°°” and 
ko € {0,1}") and x € {0,1}°", computes the output as (Xj, denotes the t bit 
prefix of X.) 

C(k, £) = h(G3*(GGMe (ki, h(ko, £)ie)) » £) 


Remark 2 (About the domain size and key-length). This construction has a do- 
main of 3n bits and key-length t: 3n +n. We can use Levin’s trick to expand the 
domain to any m bits, and this will not affect the fact that the construction is 
good: by eq.(2) we get an additional q?-5;, = q?/23"~1 term in the distinguishing 
advantage, which can be ignored compared to the other terms. 

We can also use a short n-bit key (like in plain GGM) and then expand it 
to a longer t- 3n +n bit key with every invocation (if we use Levin’s trick we 
will need an extra 4(3n + log m) bits.) This also will preserve the fact that the 
construction is good. 


Theorem 1 (Main Theorem). If G is a (6¢,s¢)-hard PRG, then CS is a 
(q, 6, 8)-secure PRF where 


S=4-q-t-bg+@/2+q'/2  s=se—q-|CS|—g-3-t-n-|G| (4) 


Before we prove this Theorem, let’s see what it implies for some concrete pa- 
rameters. Assume G is (6g = 27%, sg = 2°”)-hard and we want security against 
q = 2V” queries. If we set t := 3logq then the construction is very good (cf. 


Def. [6): 


— It makes 7t = 21 log(q) = O(log q) invocation to G per query to CS. 
— CS is strongly hardness preserving for q = 2V” queries. By eq.(4) we get 


8< g-ent2+/n-+log(3Vn) 5 2” — gvn . ICE] 


If |C°| is polynomial in n (which is the only case we care about), then by 
the above equation, for every č < c, we have 6 < 27°” and s > 2°” for 
sufficiently large n as required by Definition 5] 


The above argument works for any q = 2” where 0.5 < e < 1. It also works if 
q is unbounded (i.e. € = 1), but we only get normal (and not strong) hardness- 
preservation. The argument fails for € < 0.5, that is whenever q = 2°\V™. Tech- 
nically, the reason is that the q‘ / 2" term in eq. (4) is not exponentially small (as 
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required for hardness-preservation) when we set t = O(log q) (as required for a 
good construction.) It is an interesting open question if an optimal and hardness 
preserving construction with range w(log(q)) exits for any q = 2°V”. Summing 
up, we get the following corollary of Theorem [I] 


Corollary 1. For any 0 < ô < 1 and e € [5/2, 6, the construction CE (setting 
t := 3log(q)) is very good for q = 2” queries for any G with cn? bits of security 
(for any c > 0.) It is good for e = ô. 


Proof (Proof of Theorem [i]). Let D* be any g-query distinguisher of size s. We 
denote with Co our construction C, with C4 = Rsn,3n a random function and 
with C1, C2, C3 intermediate constructions as shown in Figure [I] With p(i) we 
denote the probability that D®:() outputs 1 (where e.g. in Co the probability 
is over the choice of ko||k,, in Cı the probability is over the choice of kọ and 
fH Ren, etc.) 

Note that the advantage of D* in breaking C is 6 = |p(0) — p(4)|, to prove the 
theorem we will show that 


dort) - p(i+1) 


The last step follows from the four claims below. 


Claim. |p(0) — p(1)| < q-t- de 


< X |p) — pit VD] < 4-q: t- ðe +47/2” +2” 
i=0 


Proof (Proof of Claim). Assume |p(0) — p(1)| > q: t- c. We will construct a 
distinguisher Dı for GGMg and Ri n, which is of size sg — q- t - |G| and has 
advantage > q: t- ðc, contradicting Proposition B] peo) chooses a random ko € 
{0,1}", and then runs D* where it answers its oracle queries by simulating C 
(using ko), but replacing the GGMg invocation with its oracle O(.). In the end 
Dı outputs the same as D. If O(.) = GGMa (kı, .) (for some random k1) then this 
simulates Co, and if O(.) = Rtn it will simulate C;. Thus D; will distinguish 
GGMg and Rin with exactly the same advantage > q- t- ôs that D has for Co 
and Cy. 


Claim. |p(1) — p(2)| <q-3-t-d¢ 


Proof (Proof of Claim). Assume |p(1) — p(2)| > q-3-t- dg. We will construct 
a distinguisher Də which is of size sg — q-3-n-t-|G| who can distinguish q- 
tuples of samples of Uzin from G*(Uņ„) with advantage > q-3-t- dg. Using a 
standard hybrid argument this then gives a distinguisher D who distinguishes 
a single sample of U3:, from G% (Uņ„) with advantage > q-3-t-d¢/q=3-t- dc, 
contradicting Lemma B] 

Də on input v1,..., Yq € {0,1}%", runs D* and answers its oracle queries by 
simulating Cı, but replacing the output of Gt with the v;’s (using a fresh vi 
for every query, except if x’ appeared in a previous query, then it uses the same 
v;i as in this previous query. If the v;’s have distribution G°“(U,,) this perfectly 
simulates C4, and if they have distribution U3;,, this simulates C2. So Də has the 
same distinguishing advantage as D* has for Cı and Co. 
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The proofs of the final two claims are completely information theoretic. 
Claim. |p(2) — p(3)| < 4?/2” 


Proof (Proof of Claim). We claim that the distinguishing advantage of any (even 
computationally unbounded) q-query distinguisher for Cz and C3 is < q?/2”. 

We get C3 from C2 by replacing the two nested uniformly random functions 
fal.) = Rutan (Ren(.)) with a single f3(.) = Rt,t-3n(-). As each invocation of C2 
results in exactly one invocation of fo(.), the distinguishing advantage of the best 
q-query distinguisher for Ca from C3 can be upper bounded by the distinguishing 
advantage of the best such distinguisher for f2(.) and f3(.). 

Let E denote the event that the q distinct queries r/,...,x/, to fo(.) = 
Rn,t-3n(Ren(.)) do not contain a collision on the inner function, i.e. Ri n(x) Æ 
Rt,n(x5) for all x A z4. Conditioned on €, the outputs of fọ and f3 have the 


same distribution, namely ub. Using this observation, we can bound (using 


e.g. Theorem 1.(i) in [I7] or the “fundamental Lemma” from gE the distin- 
guishing advantage of any q-query distinguisher for fs and fs by the probability 
that one can make the event E fail. This is the probability that q uniformly 
random elements from {0,1}” (i.e. the outputs of the inner function) contain a 
collision. This probability can be upper bounded as q?/2”. 


Claim. |p(3) — p(4)| < a¢/2" 


Proof (Proof of Claim). We claim that the distinguishing advantage of any (even 
computationally unbounded) q-query distinguisher for C3 and C4 is < q‘/ Qt We 
will first prove this only for non-adaptive distinguishers. To show security against 
adaptive adversaries, security against adaptive adversaries will then follow by a 
result from [I7]. 

Let 21,...,%q denote the q distinct queries non-adaptively chosen by D*. 
Let € denote the event which holds if there is no t + 1-wise collision after the 
evaluation of the initial hash function h in C3. That is, there is no subset Z C [q] 
of size |Z| = t + 1 such that h(ko, xi) = h(ko, xj) for all i,j € Z. Below we 
show that conditioned on E, the outputs y1,..., Yq (where y; := C3(x;)) are 
uniformly random, and thus have the same distribution like the outputs of C4. 
Using Theorem 1.(i) in or the “fundamental Lemma” from [3], this means we 
can upper bound the distinguishing advantage of any non-adaptive distinguisher 
for C3 and C4 by the the probability that the event € fails to hold. Which means 
we have a t + 1 wise collision in q t-wise independent strings over {0,1}. This 
can be upper bounded as ¢/2 H 

We now show that the outputs of C3 are uniform conditioned on €. Consider 
a subset J C |q] with |J| < t such that h(ko,x:) = h(ko,xj) = a for all 


3 Informally, the statement we use is the following: given two systems F and G and an 
event E defined for F’, if F conditioned on E behaves exactly as G, then distinguishing 
F from G is at least as hard as making the event € fail. 

4 The probability that any t particular strings in {0,1} collide is exactly (2~*)’~' = 
Prk we get the claimed bound by taking the union bound over all q*/t! possible 

t-element subsets of the q element set. 
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i,j € J. The fact that Rt stn(a) is a uniformly random together with the fact 
that h is a t-wise independent hash function implies that the joint distribution 
of (h(a, zi))iez follows the uniform distribution. Now let J1,..., Jy be the 
subsets of [q] of size at most t, such that for j € Ji h(ko, £j) = a; and all a,’s 
are distinct. The fact that (R+,3nt(@i))ie[q’] follows the uniform distribution and 
Jı,- , Jy are of size at most t implies that (h(Rt,3nt(ai), Vi))ie[q follows the 
uniform distribution as well. 

So far, we only established the indistinguishability of C3 and C4 against non- 
adaptive distinguishers. We get the same bound for adaptive distinguishers using 
Theorem 2 from [I7], which (for our special case) states that adaptivity does not 
help if the outputs of the system (C3 in our case) are uniform conditioned on 
the event we want to provoke. Very recently [LI] found that the precondition 
stated in[I7] is not sufficient, but one additionally requires that the probability 
of the event failing is independent of the outputs observed so far. Fortunately in 
our case (and also for all applications in [I7]) this stronger precondition is easily 
seen to be satisfied. 
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A Intuition for Conjecture [I] 


We can think of the GGM construction with domain {0,1} as a tree, where the 
outputs are leaves at depth m. More generally, we can think of any construction 
FS as a directed loop-free graph, which is separated in layers. Each invocations 
starts at the root which holds the secret key K, and the computation follows 
a path, crossing layers, where the path within each layer contains at most one 
invocation of G. 

We can define the “entropy” of a layer, as the amount of randomness leaving 
the layer (assuming G is a uniformly random function.) In the GGM contsruction, 
the fist layer has 2n bits of randomness, namely G(K), the ith layer has 2*n bits 
of randomness. In a construction F° contradicting the conjecture, there must 
be a layer which has significantly more than twice as much randomness as the 
layer before. To see this note that if each layer at most doubles the randomness, 
we need log(q) layers to get the gn bits of randomness. And moreover the last 
layer must have qn bits of randomness, as for a black-box security proof the only 
source of randomness is G. 

Now, if a layer more than doubles its randomness, it must be the case that 
in this layer, G is invoked on either (1) inputs that are not uniformly random, 
or (2) the inputs to G are not independent. In a black-box reduction from FS 
to G, one considers a series of hybrids H,, H2,...,H;, where H; is FE and H; 
is a random function. One gets from a hybrid H; to Hi+ı by replacing some 
internal value Y := G(X) with a uniform Uəņn. If we have an adversary A who 
can distinguish H; from H;41, we can use it to tell if a random variable Z has 
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distribution U2, or G(U,) by replacing Y with Z in H;, and using A to tell if 
what we get is H; (which will be the case if Z = G(U,)) or Hi+1. 

In the above argument, it is crucial that X has distribution Un, but if (1) or 
(2) holds, this will not be the case. It is hard to imagine a black-box technique 
which works differently than by replacing some internal variables Y with the 
challenge Z, which as just explained will not work here. 
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Abstract. Computational extractors are efficient procedures that map a 
source of sufficiently high min-entropy to an output that is computation- 
ally indistinguishable from uniform. By relaxing the statistical closeness 
property of traditional randomness extractors one hopes to improve the 
efficiency and entropy parameters of these extractors, while keeping their 
utility for cryptographic applications. In this work we investigate com- 
putational extractors and consider questions of existence and inherent 
complexity from the theoretical and practical angles, with particular fo- 
cus on the relationship to pseudorandomness. 

An obvious way to build a computational extractor is via the “extract- 
then-prg” method: apply a statistical extractor and use its output 
to seed a PRG. This approach carries with it the entropy cost inherent 
to implementing statistical extractors, namely, the source entropy needs 
to be substantially higher than the PRG’s seed length. It also requires a 
PRG and thus relies on one-way functions. 

We study the necessity of one-way functions in the construction of 
computational extractors and determine matching lower and upper 
bounds on the “black-box efficiency” of generic constructions of com- 
putational extractors that use a one-way permutation as an oracle. Un- 
der this efficiency measure we prove a direct correspondence between 
the complexity of computational extractors and that of pseudorandom 
generators, showing the optimality of the extract-then-prg approach for 
generic constructions of computational extractors and confirming the in- 
tuition that to build a computational extractor via a PRG one needs to 
make up for the entropy gap intrinsic to statistical extractors. 

On the other hand, we show that with stronger cryptographic primitives 
one can have more entropy- and computationally-efficient constructions. 
In particular, we show a construction of a very practical computational ex- 
tractor from any weak PRF without resorting to statistical extractors. 


1 Introduction 


Randomness extractors (or simply ‘extractors’) are algorithms that map sources 
of sufficient min-entropy to outputs that are statistically close to uniform. 


* See [DGKM11]] for the full version of this paper. 
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Randomness extraction has become a central and ubiquitous notion in complex- 
ity theory and theoretical computer science with innumerable applications and 
surprising connections to other notions. Cryptography, too, has greatly benefited 
from this notion. Cryptographic applications of randomness extractors range from 
the construction of pseudorandom generators from one-way functions to the de- 
sign of cryptographic functionalities from noisy and weak sources (including ap- 
plications to quantum cryptography) to the more recent advances in areas such 
as leakage- and exposure-resilient cryptography, circular encryption, lattice-based 
cryptosystems, and more. Randomness extractors have also found important uses 
in practical applications, particularly for the construction of key derivation func- 
tions. In many of these cryptographic applications, the defining property of ran- 
domness extractors, namely, statistical closeness of their output to a uniform dis- 
tribution, can often be relaxed and replaced with computational indistinguisha- 
bility. Extractors that provide this relaxed guarantee are called computational ex- 
tractors, and they are the main object studied in this paper. 

Let us review informally some basic facts about statistical extractors and the 
associated parameters n,m,k,6. A function Ext : {0,1}" x {0,1} — {0,1}™ 
is a (k,2~°)-statistical extractor if for any distribution X on {0,1}” with min- 
entropy k, the statistical distance between Ext(X,U:) and Um is at most 2~°, 
where Uy, Um denote the uniform distribution over {0,1}*, {0,1}, respectively. 
Note that extractors are randomized via the second argument called a seed or 
key (in our actual definitions we require the seed to be output, i.e., the so called 
strong extractor). We are interested in extractors where the values k and 27° are 
as small as possible (i.e., we want to minimize the entropy requirement from the 
source and get as small as possible statistical distance of the output to uniform). 
It is known how to construct statistical extractors that achieve ô = (k+€—m)/2 
[HILL99}. Radhakrishnan and Ta-Shma show that this bound 
on 6 is optimal, by showing how to build, for every extractor with parameters 
as above, a source distribution of min-entropy k for which the output of the 
extractor is 2~°-far from uniform for 6 = (k + £ — m)/2. In the sequel we refer 
to this as the RT bound. 

A major motivation to study computational extractors is that they allow 
us to go beyond the RT bound by replacing statistical closeness to uniform 
with computational indistinguishability. Indeed, an obvious way to do so is to 
first use a statistical extractor applied to the source distribution to obtain a 
short statistically close-to-uniform string and then use this string as a seed to a 
pseudorandom generator (PRG) to obtain more bits that are indistinguishable 
from uniform. We will refer to this as the extract-then-prg approach. 

While the latter is a natural way to build computational extractors, it is not 
the only one or necessarily the best one, especially when implemented in practi- 
cal settings. In particular, this approach carries with it the entropy limitations 
of statistical extractors as set by the RT bound, a serious concern in cases where 
the entropy of the source is too small to produce (via the statistical extractor) 
a sufficiently long key for the PRG. For example, consider the use of an extrac- 
tor to convert a 160-bit elliptic curve Diffie-Hellman value (which by the DDH 


Computational Extractors and Pseudorandomness 385 


assumption has 160 bits of computational min-entropy) into a 128-bit seed for 
an AES-based PRG. Applying a statistical extractor to the DH value only guar- 
antees a poor indistinguishability bound of 2716 (i.e., 6 = (160 — 128)/2). If we 
wanted to preserve, say, 100-bit security we would need 6 = 100 bringing the 
required source entropy to 328 (= 128+ 2-100). 

One way around this problem is to build dedicated computational extrac- 
tors based on cryptographic functions. Such an approach is taken in 
[DGHt 04], where computational extractors are built using specific schemes 
(HMAC and CBC) under assumptions that are specific to these schemes (and 
directed to the use of these extractors in the context of key derivation functions) 
including random-oracle type assumptions. On the other hand, the recent results 
of show that for some key derivation applications one may relax the 
entropy requirements dictated by the RT bound (see more discussion on these 
issues in Section [Z). 

In this work, we further investigate computational extractors and consider 
questions of existence and inherent complexity from the theoretical and prac- 
tical angles, with particular focus on the relationship to pseudorandomness. In 
particular, we ask how intrinsic is the use of pseudorandomness in constructing 
computational extractors, to what extent can we build computational extractors 
without resorting to a statistical extractor, and whether the “entropy penalty” 
of the extract-then-prg approach is avoidable. 


Our Results 


On the existence of computational extractors. The most basic question with 
respect to computational extractors is whether they exist at all and if they do 
under what (if any) assumption. The trivial answer is affirmative: statistical 
extractors are also computational. But we are interested in non-trivial computa- 
tional extractors that output “more bits” than a statistical one. To capture this, 
we define the notion of stretch. For a security parameter p consider an extractor 
acting on a k(p)-entropy source: its stretch ø is the difference between the extrac- 
tor’s output length and its input’s min-entropy, i.e., o(p) = m(p) — k(p) — (p). 
Computational extractors with negative stretches of the m —w(log p) exist 
unconditionally since a statistical extractor (that matches the RT bound) gener- 
ates an output that is 2-“(!°8”)-close to uniform and therefore is computationally 
indistinguishable from uniform. Thus, non-trivial computational extractors are 
those for which the stretch is at least —O(log p): we call such stretches and their 
associated extractors proper. The fact that proper computational extractors can 
be built on the basis of one-way functions via the extract-then-prg approach, 
raises the fundamental question: Are one-way functions necessary for building 
proper computational extractors? One would expect the answer to be “of course 
they are!”. However, we can only provide a partial answer: We can show this 
to be the case for proper extractors of positive stretch. But for stretches in 
the range between —O(logp) and 0 the question remains open. Interestingly, 


1 w(-) stands for any superlinear function (i.e., one that grows faster than any linear 
function of its argument). 
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however, we can provide an affirmative answer under the assumption that the 
RT bound applies to efficiently samplable distributions. We refer to this as the 
SRT Assumption (see details in Section B): 


SAMPLABLE RT (SRT) Assumption. Let Ext : {0,1}" x {0,1} > {0,1}™ 
be a poly-time computable statistical extractor. Then, for k < n there exists a 
poly-time samplable source X of min-entropy k such that the statistical distance 
between the distributions Ext(X,Ue) and Um is at least 2-Oh+4—™), 


In other words, the SRT assumption strengthens the RT bound by requiring it 
to hold even if we restrict our attention to efficiently samplable distributions. 
On the other hand, it weakens the RT bound by only requiring it to hold for 
efficient extractors and by reducing the lower bound requirement to 27° (#+4-™) 
for any constant c (in the RT bound, c = 1/2). To the best of our knowledge, the 
validity of this assumption has not been settled. Interestingly, given our results, 
any resolution of the assumption will have significant consequences. Disproving 
the assumption would open the door to the possibility of more effective statistical 
extractors for applications that are only concerned with efficient sources; e.g., 
it would mean that extractors based on the Leftover Hash Lemma may not 
be the best in practice (a surprising conclusion that may actually indicate the 
plausibility that the SRT does hold). And if the SRT assumption does hold, then 
our work settles affirmatively the question of existential equivalence of proper 
computational extractors and one-way functions. 


Black-box constructions of proper extractors from OWPs. After investigating 
the relationship between proper extractors and one-way functions, we examine 
the question of whether we can have black-box constructions of proper extrac- 
tors from OWPs that are more efficient than going through the extract-then- 
prg approach. As the measure of efficiency we use “OWP-complexity” , namely, 
the number of invocations to the OWP in a black-box construction, follow- 
ing |@GKT05]. We prove a lower bound on the OWP-complexity of black-box 
constructions of proper extractors from OWPs. We show that, under the SRT 
assumption, the OWP-complexity of the extract-then-prg construction is opti- 
mal by showing a tight lower bound on the number of invocations to the OWP 
for any black-box construction of proper extractors from OWPs. Interestingly, 
this result confirms the intuition that in order to build a proper computational 
extractor one needs to make up for the entropy gap intrinsic to the RT bound 
(as explained above). 

The above result applies to any black-box construction of a proper extractor 
that has oracle access to a OWP and it puts no restriction on the security reduc- 
tion (which efficiently transforms an extractor-attacker into a OWP-attacker). 
A more restricted form of black-box constructions, known as fully black-boz, also 
requires that the reduction between attackers be black-box (i.e., the reduction 
cannot access the code of the extractor-attacker). Interestingly, we prove a sim- 
ilar bound for fully black-box constructions, but unconditionally, i.e. without a 
need for the SRT assumption. Thus, we trade the more restricted form of black- 
box reduction for a lower bound that fully dispenses with the SRT assumption. 
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(For a thorough treatment of the semi- and fully-black-box notions and their 
meanings and implications please refer to [RTV04].) 


Constructions based on Stronger Primitives. Next, we investigate the possibility 
of avoiding the intrinsic entropy loss in the generic extract-then-prg construction 
by assuming stronger primitives as the basis for the construction. 

Our first result in this direction shows that given an exponentially-hard OWP, 
one can build a proper computational extractor where the OWP is applied di- 
rectly to the high-entropy source without having to go through an initial extrac- 
tion phase, hence avoiding the need to compensate for the entropy gap of the 
extractor. In order to achieve this result we replace the standard extract-then- 
prg approach with a dual prg-then-extract scheme that exploits the exponential 
hardness of the OWP to build a PRG that uses as its seed the very input from 
the high-entropy source] 


A practical computational extractor based on wPRF. We show a very simple 
construction of computational extractors based on weak pseudorandom functions 
(i.e., PRFs whose output is indistinguishable from uniform by adversaries that 
only see values of the function computed on random independent inputs). For 
this we resort to a lemma by Pietrzak showing that weak PRFs retain 
some of their security even when the keys are chosen from an imperfect source. 
More specifically, shows that if the original keys are of length n but they 
are chosen from a source with min-entropy k < n then their security degrades 
roughly by an (optimal) factor of 2~“"—*). This allows us to construct (strong) 
computational extractors where the source distribution is used to sample a key 
for the PRF and the extractor’s random seed is used as an input to the PRF. This 
results in a very practical construction of computational extractors that fully 
dispenses with statistical extractors and perfectly fits the needs of randomness 
extraction in the context of key derivation functions (KDF) as studied in [Kra10] 
and as extensively used in real-world applications. In particular, one obtains a 
very practical KDF for cases where the input to the KDF (the source of key 
material) is at most of the size of the wPRF key. The security of the scheme 
solely depends on the security of the underlying (weak) PRF and it implies 
meaningful security bounds even in constrained cases where the entropy-output 
gap is small (or even negative). See Section [7] for details. 


Relations to work on statistical extractors. While a main theme of our work is the 
role of pseudorandom generators in the construction of computational extractors, 
it is interesting to point out that pseudorandomness also plays a fundamental role 
in the development of statistical extractors. Starting with the work of Trevisan 
it has been realized that constructions of “non-cryptographic” pseudo- 
random generators such as can lead to efficient statistical extrac- 
tors. The notion of pseudorandomness in these works is usually weaker than the 
traditional cryptographic notion (that we use in our definition of computational 


? This construction is somewhat reminiscent of the techniques used by Kalai et al. 
in |KLRO9} for building two-source or network extractors, though the context and 
goals of these constructions are different. 
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extractors), e.g., they allow for super-polynomial (on the seed length) running 
time or consider more limited adversaries. Also the focus on efficiency in statis- 
tical extractors has traditionally been geared towards minimizing the size of the 
random seed as this determines the utility of these extractors in derandomization 
applications. See for a survey of results in this area. It would be very inter- 
esting to find closer relations between results in the above area and the questions 
raised by our work. In particular, in spite of the large body of work on statistical 
extraction, there seems to be little work that investigates statistical extractors 
against (efficiently) samplable sources. The only paper on the subject that we 
are aware of is by Trevisan and Vadhan who show that if we only care 
about samplable distributions we can use deterministic extractors; however, this 
only works as long as the sampler of the source is computationally weaker than 
the extractor itself. Indeed, shows that if we allow the source to depend 
on the extractor and to have higher computational complexity then determinis- 
tic extraction is not possible. In terms of our SRT assumption, what this shows is 
that the SRT does apply to deterministic extractors (for each such extractor there 
is a samplable source where the extractor fails). For all we know, the seemingly 
fundamental question of the entropy bounds that apply to statistical extractors 
when acting on samplable sources has not been studied. We hope that our work 
will provide motivation to investigate this question. 


2 Proper Computational Extractors 


We recall the definitions of statistical extractors, define proper computational 
extractors and give some of their basic properties. All extractor definitions pre- 
sented here are stated in an asymptotic setting; in Section 5] we provide defini- 
tions in a concrete-complexity framework. 


2.1 Preliminaries 


Terminology. A probability ensemble X is an infinite sequence of proba- 
bility distributions {X,} indexed by a parameter p. We usually assume that 
for all p, X, has support in {0, 1}"@) where n(-) is a polynomially bounded 
function. For any integer t we use the symbol U, to denote the uniform dis- 
tribution on {0,1}. The statistical distance between two probability ensem- 
bles X,Y with common support ensemble {0,1}"() is defined as the function 
Ax,y (p) = maxrcyo,1}»@) |Pr[Xp € T] — Pr[Yp € T]|. We say that a distribu- 
tion X has min-entropy k(p) if for all x in the support of X, it holds that 
Prx, [a] < 2-*(). For simplicity, in what follows we assume that the entropies 
denoted k(p) are positive integers (in case k(p) is not an integer, our results hold 
by replacing it with [&(p)]). 


Definition 1. An extractor family (or simply extractor) is an infinite family 
E = {Ep}, indexed by a parameter p, of the form Ep : {0, 1}"@) x {0,1}£@) > 
{0,1}™) where the functions n(p),£(p),m(p), are all polynomial in p. The ex- 
tractor family E is called (k(p),¢(p))-statistical if for any probability ensemble 


Computational Extractors and Pseudorandomness 389 


X with support in {0,1}"®) and min-entropy k(p), it holds that the statistical 
distance between (Ug(p), Ep(Xp,Ugp))) and Ug) 4m(p) is at most e(p). 


The probability distribution from which the first input is taken is called the 
source and the second input is the seed. This definition of an extractor, requiring 
the joint distribution of output and seed to be £ statistically-close to uniform, is 
sometimes referred to in the literature as a strong extractor. A weaker flavor of 
this definition, referred to as a weak extractor, is one where one only considers the 
distance between the output E,(Xp,Ugp)) and the uniform distribution U,m(p) 
(without the seed, which may remain hidden). In this paper, unless otherwise 
noted, an “extractor” refers to a strong extractor. 

Intuitively, the goal of an extractor is to extract close-to-uniform bits out of a 
source with sufficiently high min-entropy, using a “short” uniformly random seed. 
We require that the output is longer than the seed [| specifically that m(p) > 
e(p) +1. 

Ideally, we’d like to extract all the randomness from the input, getting m = 
k + £ truly uniform bits (with € = 0). However, this is impossible in general. 
From the results of we have the following lemma (which 
holds even for weak extractors) showing a tight relationship between how much 
of the input entropy k + £ can be extracted, and the distance £ from uniform. 


Lemma 1 (RT Bound [RTS00}). Let E be a (k(p),e(p))-statistical extractor 
with parameters n(p), €(p),m(p) where k(p) < n(p) — oaf and e(p) < 1/2. 
Then e(p) > 2 SOO hat is, for every such E there is a probability 
ensemble X with min-entropy k(p) for which Ep(Xp, Uep) ) has statistical dis- 


_ k(p)+e(p)—m(p) 
2 


tance min{4, 2 } from Um(p)- This bound is tight and achieved, in 
particular, by statistical extractors implemented via pairwise independent hash 
functions. 


2.2 Proper Computational Extractors and Proper Stretch 


We start by defining computational extractors, which differ from statistical ones 
in that the output is only required to be computationally indistinguishable from 
uniform rather than statistically close, the extractor itself needs to be efficient, 
and it is only required to work on efficiently samplable distributions. 


Definition 2. A family E of extractors is called k(p)-computational if Ep is 
polynomial-time computable, and for all efficiently-samplable probability ensem- 
bles X with min-entropy k(p), the joint distribution (Upp), Ep(Xp, Uepy)) is com- 
putationally indistinguishable from Ue p)+m(p)- 


In this definition “efficiently samplable” means samplable by a polynomial-time 
algorithm and “computationally indistinguishable” refers to the regular notion 
of negligible advantage for all polynomial-time distinguishers. In a non-uniform 
setting, polynomial-time is be replaced by poly-size circuits. 


3 Without this condition, the trivial extractor that outputs its seed works for any 
source (even with 0 entropy). 
4 The symbol O(1) represents a specific constant calculated in [RTSOQ]. 
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Discussion. The defined notion corresponds to a strong extractor (see Sec- 
tion BP.I). A weak computational extractor is defined similarly but only requiring 
that the output E,(Xp, Ue(p))) (without the seed) is indistinguishable from uni- 
form. Although our lower bounds hold even for weak extractors, we focus our 
treatment on strong extractors, because in the computational setting, weak ex- 
tractors are not very interesting. Indeed, any PRF is, by definition, a weak 
computational extractor that works for any source distribution. 

We require the output of the computational extractor to be pseudorandom 
only when the input is an efficiently samplable distribution. Indeed, for com- 
putational uses (where we model feasible computation as polynomial-time) a 
hard-to-sample distribution is of little interest. In particular, we would not want 
to disqualify a good computational extractor just because it fails on a hard to 
compute source. Also, samplable sources allow to use the same seed — as long 
as it has been chosen at random and independently of the source — with mul- 
tiple samples (this is crucial in some applications, including key derivation as 
discussed in Section [Z). 

At the same time, it is worth noting that we could consider a flavor of our 
definition where efficient samplability is replaced with oracle access (for the 
attacker) to an arbitrary distribution. The lower-bound results from Sections 
[3] and [4] hold for this definition, while the upper bound from Lemma [7] holds 
as long as the OWP is secure against non-uniform attackers (non-uniformity is 
necessary to argue that access to a hard-to-compute distribution does not help 
the attacker break the OWP or other primitives such as a PRG). Finally, we 
note that for our results on fully black-box reductions from Section [5| we do 
consider the latter setting, namely, arbitrary distributions to which the attacker 
gets oracle access. 

It is clear that any efficient (k(p), €(p))-statistical extractor for a negligible 
e(p), is also a k(p)-computational extractor. Thus, the upper bound of Lemma 
[]implies the following. 


Lemma 2. There exist extractors with parameters n(p), £(p),m(p) that are k(p)- 
computational for any k(p) < n(p) — O(1) such that 


k(p) = m(p) — £(p) + w(log p) (1) 


Note that the Lemma is unconditional, i.e., computational extractors with pa- 
rameters as in (I) exist unconditionally. In this sense, non-trivial computational 
extractors are those whose parameters beat (i), and in particular have an output 
that is (indistinguishable from but) statistically far from uniform. We call such 
extractors proper, defined as follows. 


Definition 3. The stretch o(p) of a k(p)-computational extractor with param- 
eters n(p), £(p),m(p)is defined as o(p) = m(p) — k(p) — (p). The stretch o(p) 
is proper if o(p) > —O(logp) (i.e., there exists a constant c such that o(p) > 
—clogp for all p). A k(p)-computational extractor is proper if its stretch is 
proper. 
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Note that the stretch does not only depend on the extractor but also on the 
input entropy k(p) (though, for simplicity, we sometimes omit the explicit k(p) 
notation when talking about proper extractors). Since, for simplicity, we have 
assumed that k(p) is integer (or else we consider [&(p)]) then the stretch is 
integer and can be negative, zero, or positive. Hereafter, when we say “proper 
extractor” we mean “proper computational extractor.” 


3 The Equivalence of Proper Extractors and One-Way 
Functions 


Note that statistical extractors have statistical distance from uniform of at least 
elp) =2-~ = which is 1/poly(p) (hence non-negligible) in the case of 
proper extractors. Thus, statistical extractors do not immediately yield proper 
computational extractors. 

This raises the question: Do proper computational extractors exist? The fol- 


lowing Lemma answers this in the affirmative, assuming one-way functions exist. 


Lemma 3. If one-way functions exist then strong proper computational extrac- 
tors exist too. 


Proof Sketch. Let E = {Ep} be a k(p)-computational extractor with parame- 
ters n(p), (p), m(p) for which equation (f) holds (such an extractor exists for any 
functions m(p), k(p) as in Lemma P). Also assume w(log p) < p. Let {Gp} be a 
pseudorandom generator with seed length m(p) and output length k(p)+€(p) (as- 
suming OWFs, PRGs exist for some function m(p) and output length m(p) + p). 
Construct extractor E’ that first applies E and uses the output to seed the PRG. 
It is easy to see that Æ has parameters n(p), €(p),m’(p) = k(p) + &(p) and its 
output is indistinguishable from U,,/(p). But m'(p) = k(p) + &(p), thus F’ is 
proper. 


Somewhat surprisingly we can’t immediately prove equivalence between proper 
extractors and one-way functions. The opposite direction of Lemma [3] can be 
easily proven only for proper computational extractors with positive stretch as 
shown in the following Lemma. 


Lemma 4. From any (even weak) computational extractor with positive stretch 
one can build a pseudorandom generator. 


Proof Sketch. Let E be a k(p)-computational extractor with parameters n(p), 
£(p), m(p) and positive stretch o(p), i.e. m(p) > k(p) + £(p). We build a PRG G 
with random seeds of length s(p) = k(p) + €(p) and output length m(p) > s(p). 
G partitions its seed into a k(p)-long value x and an ¢(p)-long value y, and calls 
E on (a’,y) where x’ consists of x padded with n(p) — k(p) zeros. Clearly, the 
input distribution to Æ has entropy k(p), hence its output is pseudorandom. 
Since G outputs more bits than its seed then G is a pseudorandom generator. 
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The last two lemmas leave the following question: Does the existence of proper 
computational extractors, even those with non-positive proper stretch imply the 
existence of one-way functions? In particular, is this the case for computational 
extractors of stretch 0? To provide an affirmative answer we need to resort to an 
additional assumption about the RT bound. 


Samplable RT (SRT) Assumption. For every polynomial-time computable 
extractor E with parameters n(p),€(p),m(p) and every function k such that 
k(p) < n(p) — O(1), there exists a poly-time samplable probability ensemble 
X of min-entropy k such that the statistical distance between the distributions 
Ep(Xp, Urp) and Um(p) is at least min{ 4, 270P) +4)—m(p)) 


In other words, we are assuming that if we restrict attention to efficiently 
samplable sources then the RT bound still applies. More accurately, we as- 
sume a weaker bound where the RT bound 27 2(*(2)+4(p)—m()) is replaced with 
2- (kp) +4(p)—m(P)) for any constant c, possibly much larger than 1/2. In addi- 
tion, we assume this to be the case only for efficient extractor. This assumption 
is not implied by the proof in which builds a source on which the extrac- 
tor incurs the claimed bound but this source may not be efficiently samplable. 
Quite interestingly, the question raised by this conjecture does not seem to have 
been widely researched. Any answer to it, positive or negative, would be of in- 
terest. If true it implies the equivalence of proper computational extractors and 
pseudorandom generators (see Theorem [I}. If disproven it would open the possi- 
bility of building efficient extractors that beat the RT and Leftover-Hash-Lemma 
bounds on efficient sources. 


Lemma 5. Under the SRT assumption, the existence of a proper extractor im- 
plies the existence of a OWF. 


Proof Sketch. Let E be a proper k(p)-computational extractor and let X be 
a polynomial-time samplable ensemble of min-entropy k(p), then the output 
of E on X induces a polynomial-time samplable distribution that is statisti- 
cally far from uniform but computationally indistinguishable. Thus, the pair 
of distributions (Ep(Xp, Ugp)), Um(p)) are efficiently samplable, have statistical 
distance greater than 1/poly(p) for some polynomial and are computationally 
indistinguishable. Using the results of [HILL99], constructing such a pair 
of distributions is sufficient to construct pseudorandom generators (PRG). This 
in turn implies the existence of OWF. 


From Lemmas [B] and [5] we get: 


Theorem 1. Under the SRT assumption, proper computational extractors exist 
if and only if one-way functions exist. 


5 It is most likely (using a counting argument) that the conjecture does not hold for 
super-polynomial extractors, namely, there may be inefficient extractors that beat 
the RT bound on all efficiently samplable distributions. 
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4 The Cost of Black-Box Constructions of Proper 
Extractors from OWPs 


In this section we follow the methodology from for quantifying the 
cost, as a number of OWP invocations, of (semi) black-box constructions of 
proper computational extractors from OWPs. We show a lower bound on the 
number of calls to the OWP that depends on the strength of the OWP and 
the stretch of the extractor. This result reflects the intuition that in order to 
build a computational extractor one needs to first make up for the entropy gap 
intrinsic to the RT bound. Indeed, the result shows that it is not enough to call 
the OWP just to generate as many bits as the extractor’s stretch but one needs 
to generate w(log p) additional bits to cover for the loss of entropy. Comparing 
with the corresponding results of about pseudorandom generators, we 
see that making up for this entropy gap is the only intrinsic difference between 
proper extractors and PRGs (under this black-box complexity measure). We also 
prove that the lower bound is tight. 


Remark: Our lower bounds deal with constructions of computational extrac- 
tors from one-way permutations. However, we note that our results extend to 
the case of one-way functions since our lower bounds are proven using random 
permutations which are not efficiently distinguishable from one-way functions. 
However we do not know if for the case of OWF our bounds are tight (i.e. the 
currently known constructions based on OWF have a larger number of queries). 

In Section] we showed that proper extractors are equivalent to one-way func- 
tions. Here we formalize a notion of black-box constructions for computational 
extractors: such constructions access a one-way function as an oracle, rather 
than having access to the code of an algorithm computing it. 

We start by developing an analogue of the treatment from to the 
asymptotic setting of our analysis. For any integers t,n, t < n, we denote by Hn 
the set of all permutations over {0,1}” and by IMi n the set of permutations in 
IIn that arbitrarily permute the first t bits of input while leaving the remaining 
n — t bits fixed. 

For a security parameter p denote with n(p), k(p), (p), m(p) and t(p) integer 
functions that grow polynomially in p. Assume also that t(p) < n(p) and k(p) < 
n(p) — O(1) for all p. Consider an infinite family of permutations IT = {mp}7°1 
where 7p is chosen in IT,,(p). We say that IT is T(p)-hard if for sufficiently large p, 
any attacker running in time Tp) succeeds in inverting 7, with probability less 
than 1/T(p). We say that IT is one-way if it is T(p)-hard for every polynomial 
T(-). 

With I7* we denote such a family I7* = {n} }pœı where each permutation 
T» is chosen at random from the set I;(p) n(p). The following Lemma (based on 
[R89]) proves that for any hardness T (p), if we choose t(p) = 3log T (p) (and an 
additional technical condition that t(p) > 6log p), then this family is T(p)-hard 
with probability 1. 


Lemma 6. Let t(p) > 6logp. Then with probability 1, II* constructed as above 
is T(p)-hard for T (p) = 2¢)/3, 
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Proof. Let A be an adversary that runs time T(p) and attempts to invert H*. 
On expectation, over the choice of 1, A succeeds in inverting with probability 
T(p)/2*) = 1/T?(p), namely: 


Ero T)no) | Pr [A(r(x)) _ x]] = 1/T?(p). 
t~Un(p) 


Using Markov’s inequality we have that the probability over the choice of 7; 
that A inverts successfully with probability better than T(p)-1/T?(p) is at most 


1/T(p): 


Pt (Pr (AG@) =a] > TO) VEO S/T). (2) 
t(p),n(p) n(p) 


Since by choice of t(p) > 6logp we have 1/T(p) < 1/p? we get that the sum 
>» p-sco 1/T (p) is finite. The convergence of this sum allows us to apply the Borel- 
Cantelli Lemma to (2) which implies that with probability 1 over the choice of 
II* the inequality Prz~u,,,)[A(™p(x)) = x] < 1/T(p) (where A is assumed to 
run time T(p)) holds for all but a finite number of p’s. In other words, with 
probability 1 over the choice of H*, the resultant family H* is T(p)-hard. 


Definition 4. An oracle extractor construction (from a one-way permutation) 
is a family of oracle procedures EC) = {EQ : {0,1}") x {0,1} = {0,1} 7) 
such that EY) expects as an oracle a permutation Tp E€ IIn(p) and Ep’ is com- 
putable in time polynomial in p. We say that EC) has black-box access to a family 
IT = {np}po (and denote it as E ) if EY uses Tp € II as its oracle. 

We say that € is a k(p)-computational oracle extractor if for every one- 


way family II the family EC) is a k(p)-computational extractor according to 
Definition 


Another way to restate the above definition is that there must be an efficient 
reduction from distinguishing the output of the extractor from uniform to in- 
verting the permutation family. In other words, any distinguishing adversary 
can be used to construct an inverter for the permutation family. Note that the 
above definition formalizes the notion of semi black-box construction in which the 
construction (the extractor) has oracle access to the underlying primitive (the 
one-way permutation), but no restriction is made on the reduction (in particu- 
lar, the reduction might be able to access the code of the adversary). The more 
restricted notion of fully black-box constructions (in which additionally the secu- 
rity reduction only has oracle access to the adversary breaking the construction) 
will be discussed in Section [5] 

We now state the main theorem in this section. It shows that under the SRT 
assumption, proving a semi-black-box construction of a computational extractor 
for which q(p) - t(p) — o(p) = O(log p) is at least as hard as proving that OWFs 
exist (or, equivalently, proving such a construction is at least as hard as proving 
that the SRT assumption implies OWF). 
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Theorem 2. Let € be a proper k(p)-computational oracle extractor according 
to Definition [4] which has access to a T(p)-hard family where T(p) is super- 
polynomial. Let t(p) = 3logT(p) = w(logp). Assuming SRT, if Ep” has proper 
stretch o(p) and it calls the oracle T, a total of q(p) times, then q(p)-t(p)—o(p) = 
w(log p) or else one-way functions exist. This lower bound on q(p) is tight. 


Proof. Let EC) be a proper k(p)-computational oracle extractor with param- 
eters (n(p), (p), m(p)) and proper stretch o(p) = m(p) — k(p) — €(p). By as- 
sumption €” is k(p)-computational whenever the oracle I is implemented with 
one-way permutation family, i.e. IZ is T(p)-hard where T(p) is a function grow- 
ing faster than any polynomial. In particular, by Lemma [6] this is the case 
(with probability 1) when JM is implemented by the family I7* with parame- 
ter t(p) = 3logT(p) = w(log p). We will show that if Ep” calls 7, € H* a total 
of q(p) times, we can construct a computational extractor E, with parameters 
(n(p), l (p) = Lp) + a(p)t(p), m(p)) (and no oracle calls) such that for any dis- 
tribution X, with min-entropy k(p), the output distributions E7,(Xp, Ur (p)) and 


Es” (Xp Ucpy) are q?(p)/ 2!) statistically close, and since the latter distribu- 
tion is pseudorandom so is the former (here we use the fact that q(p)/2°) is 
negligible since q(p) is polynomial and 2?) = T3(p) super-polynomial). 

More specifically, we construct Ey, : {0, 1}"@) x {0,1}@) — {0,1}”®), where 
l'(p) = €(p) + a(p) - t(p), in the following way: Let x, z’ denote the input to 
Ep. The string x and the first (p) bits of z’ are used by E, to define the 


input (a, z) to EQ and the remaining bits of z’ are used to select q(p) dis- 


tinct elements y1,- - , Yq(p) € {0, 1}*), We then define: LAD 2, Yigg Yg(p)) = 


Ep i Ya) (7 7), namely, when EY) presents its i-th query to its oracle, call it 
wi, we return as response the string y; followed by the last n(p) — t(p) bits of w;. 

Note that as long as all the y;’s are different the output distributions 
Ep(Xp, Ue(p)) and Ep” (Xp, Urp) ) are identical. The probability of a repeated 
yi is q?(p)/2') and therefore the actual statistical distance between these dis- 
tributions is negligible. In particular, we have that the output from E, is indis- 
tinguishable from random and therefore Æ, is a k(p)-computational extractor 
which makes no oracle calls. Moreover, its stretch o’(p) equals 


o'(p) = m(p) — k(p) — l (p) = m(p) — k(p) — £(p) — a(p)t(p) = o(p) — a(p)t(p). 


If, for the sake of contradiction, we assume that q(p)t(p) < o(p) + clogp for 
some constant c then we would get o(p) > —clogp meaning that F, is a regular 
(non-oracle) proper computational extractor from which, using Lemmaf5jand the 
SRT assumption, we can construct a one-way function. This proves the theorem 
(the tightness of the bound on q(p) is proven in Lemma [7] below). 


Lemma 7. The bound of Theorem] is tight: For any function o(p), polynomial 
in p, and any function W (p) that grows as w(log p) there is a black-box construc- 
tion of a strong proper extractor from OWP that attains stretch o(p) and calls 
the OWP q(p) times such that q(p)t(p) < o(p) + W (p). 
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Proof Sketch. We start by noting that there are black-box constructions of 
pseudorandom generators from OWPs that for any PRG-stretch function o'(p) 
(defined as the length of the PRG output less the length of PRG seed) call the 
OWP o'(p)/t(p) times where t(p) is defined as in Theorem [2] This is the case, 
in particular, for the Blum-Micali construction using Goldreich-Levin hard-core 
bits. Therefore, to prove the Lemma it suffices to show how to build a proper 
extractor of stretch o(p) using a PRG of stretch o(p) + W (p) for any W (p) that 
grow as w(log p). 

Let G = {G,}, be a PRG family, indexed by a parameter p, with seed length 
s(p) and output size r(p) = s(p) + o(p) + W(p), for a given (polynomial in 
p) function o(p). Assume G is (T'(p),¢(p))-secure (where e(p) is negligible in 
p). Let E be a strong statistical extractor (e.g., based on pairwise independent 
hash functions) with parameters n(p), (p), m(p) = s(p) + (p) that on input 


k(p)—s(p) 
distributions of min-entropy k(p) outputs a distribution that is 27 “z~~-close 


to Ump). Using both G and E we build a proper computational extractor E’ 
with parameters n(p), &(p),m’(p) = r(p) + €(p). On input (a, z), E’ calls E on 
(a, z) and uses the s(p)-bit output from E as the seed to G to produce an output 
of bit length r(p) = s(p) + o(p) + W (p). This, plus the ¢(p)-bit input salt, are 
the outputs from F’. 

Note that on distributions of min-entropy k(p) = r(p) — o(p), E’ has stretch 
o(p); moreover, we claim that the output from E’ is (T (p), e’(p))-indistinguishable 
from uniform where e' (p) equals e(p) plus a negligible term 27W ()/2 = g-w(legp) , 
Indeed, the only loss of security with respect to G is in the derivation of the 
seed z € {0,1}*) that is chosen from a distribution that is 2~(*()—s(®))/? = 
2-W()/2 — 9-(l08P)_close to Usp). Thus F” is a proper computational extractor 
with stretch o(p) built on the basis of a PRG of stretch o(p) + W (p) which, as 
said, implies the tightness of the bound. 


Note. The Blum-Micali construction with a randomized hardcore like Goldreich- 
Levin [GL], requires extra perfect but non-secret randomness. Hence this auxil- 
iary randomness can be supplied by the extractor’s seed and be output as part 
of the strong extractor’s output. 


5 Unconditional Fully Black-Box Lower Bound 


Next, we pose the question of what can be shown without assuming SRT. We 
show that by restricting our attention to fully black box constructions, not only 
can we get rid of the SRT but actually can show an unconditional lower bound 
on the number of OWP invocations. 

We first show an analogous lower bound to the semi black-box case (though 
unconditional) in the asymptotic, uniform setting. We then show a tighter 
concrete-complexity result in the non-uniform setting. 

To begin, we review the notion of fully black box construction/reduction. 


Definition 5. A fully black-box reduction from a primitive Q to a primitive P 
is a pair of oracle PPT Turing machines (GO), SC) such that the following two 
properties hold: 
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Correctness: For every implementation f of primitive P, g = Gf implements Q. 


Security: For every implementation f of primitive P, and every adversary A, 
if A breaks GF (as an implementation of Q) then S&f breaks f. (Thus, if f is 
“secure”, then so is Gf.) 


Notice that in a full black-box reduction, the adversary is only accessed as an 
oracle. One consequence of this fact is that the adversary does not have to 
be efficient. We remark that an implementation of a primitive is any specific 
scheme that meets the requirements of that primitive (e.g., an implementation 
of a public-key encryption scheme provides samplability of key pairs, encryption 
with the public-key, and decryption with the private key). 


5.1 Unconditional Lower Bound in the Asymptotic, Uniform 
Setting 


In this section we show an analogue of the lower bound in Theorem [2] for the 
fully black-box setting. While the bound on the number of queries is the same as 
in Theorem B] this result can be proven unconditionally (i.e., without requiring 
the SRT and without concluding that a construction that violates the bound 
implies a proof of the existence of one-way functions). However, Theorem [B] 
holds only when we consider a slightly modified definition of computational 
extractors where the output of the extractor is required to be computationally 
indistinguishable from uniform for every input probability ensemble X of min- 
entropy k. Observe that the construction outlined in Lemma [7] satisfies this 
stronger notion of security. 


Theorem 3. Let€) be a proper k(p)-computational fully black box extractor con- 
struction, which has access to a T(p)-hard family where T (p) is super-polynomial. 
Further assume that such extractor remains proper k(p)-computational on any 
k(p)-entropy source, including those that are not efficiently samplable. Let t(p) = 
3logT(p) = w(logp). If Ep” has proper stretch o(p) and it calls the oracle Tp a 


total of q(p) times, then q(p) - t(p) — o(p) = w(log p). 
Proof. See full version |DGKM11). 


Next, we present a stronger version of this result. It will be a tighter concrete 
(rather than asymptotic) lower bound, for non-uniform fully black-box construc- 
tions of proper extractors from OWP. In order to do that, we need to revisit 
definitions and preliminary Lemmas in a concrete, non-uniform context. 


5.2 Unconditional Lower Bounds in the Concrete, Non-uniform 
Setting 


We start by adapting the definition of (oracle) computational extractors to the 
non-uniform and concrete (i.e., non-asymptotic) complexity setting. 

We say that a permutation 7 over {0, 1}” is S-hard if no circuit of size < S and 
oracle access to 7 can invert m with probability better than 1/5. Additionally, 
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we say that two distributions are (S, ¢)-indistinguishable if no circuit of size < S 
can distinguish between them with probability better than e. 


Definition 6. E : {0,1}" x {0,1}* > {0,1}™ is a (k,.S,2~°)-computational 
extractor (CompEXT) if for any distribution X on {0,1}”" with H(X) > k, 
we have that (E(X, U¢), U¿) and Um+e are (S,2~°)-indistinguishable (where in- 
distinguishability holds even for circuits given oracle access to a Sampler which 
samples from distribution X). 


Definition 7. An oracle computational extractor (OCompEXT) construction 
(from a one-way permutation) is an oracle procedure EC) : {0,1}" x {0,1} > 
{0,1}™ that expects as an oracle a permutation 7 € Hn. We are interested in 
constructions where EC) is computable in time polynomial in n. 

We say that EO is is an (k, Sx, Sp, 2~°)-OCompEXT construction from OWP 
if for every permutation n that is Sz-hard, ET is an (k, Sg,2~°)-secure CompEXT 
(where indistinguishability holds even for circuits given oracle access to both 
Sampler and r). 


Using a standard averaging argument, the existence of a non-uniform attacker 
that succeeds in inverting a OWP with the help of such an oracle implies the 
existence of another attacker (of slightly larger size) that inverts the OWP with- 
out access to the oracle (just wire-in into the attacker circuit the source samples 
that maximize the attacker’s inverting probability). 

We now restate the lower bound of Radhakrishnan and Ta-Shma 
regarding the efficiency of statistical extractors (which was given in Lemma [I] 
for the asymptotic, uniform setting). 


Lemma 8. Let E’ : {0,1}" x {0,1} — {0,1}™ be a statistical extractor. Then, 
for any k < n— ©" there exists a distribution X of min-entropy k such that the 
two distributions E’(X,Ue) and Um are statistically min{4, Q- (hE —m+0)/2)) 
far, where C and C" are universal constants. 


We are now ready to state our main result in this section, namely, a lower 
bound on the number of queries to the OWP by a fully black box construction 
of a computational extractor. 


Theorem 4. Let EC) be a fully black-box construction of a (k, Sr, Sg, 27°) 
proper oracle extractor which expects an S,,-hard one-way permutation m over 
n bits. Assume that EO makes q < Sr queries to its oracle and that Sg < Sr 
and 27° > 1/82 If EC) has proper stretch o then EC) must call the one-way 
permutation q times, where q > (28 + o — C)/ (5log Sr) for some constant C. 


Proof. See full version [DGKM11]|. 


6 Construction from Exponentially-Hard One-Way 
Permutations 


The results from Sections [4] and BJ] indicate the optimality of the “extract-then- 
prg” approach when all we are interested in is minimizing the number of calls to 
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a OWP in a black-box construction. However, a significant cost of this approach 
is that in order to use an n-bit OWP we need to start with an input distribution 
whose entropy is noticeably larger than n so we can apply the extraction part of 
the construction to it and still get n bits that are close to uniform and serve as 
input to the OWP. Here we show that one can make up for the entropy gap if the 
OWP has exponential hardness. In this case, we show a black-box construction 
based on such a OWP where one applies the OWP directly on the entropy source 
without an intermediate extractor step. For this we reverse the extract-then-prg 
approach and use instead an “prg-then-extract” construction where the OWP 
is applied first to expand (pseudo) entropy and then a statistical extractor is 
applied on this expanded entropy to generate a close-to-uniform output 4 


The Construction. Given an (5,,27°/2"—*)-hard OWP, 7, we present a con- 
struction of a k-entropy strong computational extractor 


Fa {0, 1}? x {0, [Catone E {0, PERSTO) nheke 
with proper stretch o in Figure [I] 


On input (zx, 2’ = (ro, 
the extractor F does the following: 


Step 1: 
— Compute (wi, w2) = 
(Gr? (a) (Tosi Ee) yane (ro, &)) ,(T2640-1,--+,10)) 


Step 2: 
— Let F” : {0,1}"*?°+? x {0, 1} > {0,1}**4*° be a statistical (k+26+0,27°) 
strong extractor. 
— Compute (v, z) = F’(wi, z). 
Step 3: F outputs (v, z, w2) € {0, Pret rete, 


Fig. 1. STRONG COMPUTATIONAL EXTRACTOR FROM EXPONENTIALLY-HARD OWP 


The proof of Lemma [0] that F is indeed a strong extractor when 7 is an 
exponentially-hard OWP is based on the following lemma showing that 
exponentially-hard OWP’s are “hard to invert” on arbitrary distributions of suf- 
ficiently high min-entropy. 


Lemma 9. Letn : {0,1}" > {0,1}” be an (S,¢)-one way permutation and let 
X be a distribution over {0,1}" of min-entropy k where k = n— a. Then for all 
adversaries A of size at most S it is the case that: 

Pr [A(m(x)) = a] < e- 2%. 

aN Xx 
6€ IBDK* 11] also uses the prg-then-extract approach for constructing an extractor; in 


their case, however, the prg is used to expand the seed rather than for increasing 
the computational entropy of the source as in our case. 
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Lemma 10. The construction of F from Figure] is a black-box construction of 
a (k, S,/poly(n), poly(n) - 2-°))-strong CompEXT with proper stretch o from 
any (Sq, 2-5/2"-*)-OWP x. 


Proof. See full version |DGKM11). 


7 Practical Computational Extractors from Weak PRF 


In this section we explore a connection between computational extractors and 
pseudo-random functions. We show a very efficient construction of a strong com- 
putational extractor using any PRF, and demonstrate its practical utility in the 
context of key derivation functions. Actually, we do not need the full security of 
a PRF; it suffices that the PRF is secure against attackers that do not choose 
inputs to the function but only see pairs of (input, output) where the inputs are 
chosen uniformly at random. Such PRFs are referred to as weak PRF (wPRF) 
(note that in our application ‘weak’ is stronger). The proof of our scheme follows 
directly from recent results by Pietrzak about leakage-resilient wPRFs. 


Weak PRF. A pseudo-random function family is a family of functions F = 
{fa : {0,1}* + {0,1} }ac{0,1}» with the property that if a is chosen uniformly 
at random in {0,1}, then the function fa is computationally indistinguishable 
from a random function from {0,1}* to {0,1}™. More specifically, no efficient 
algorithm which has oracle access to either f, or to a random function, can 
decide which is the case. If the oracle access is restricted to query the function 
on randomly chosen inputs, one obtains the notion of weak PRF (wPRF). We 
quantify this notion by saying that F is a (S,q,¢)-wPRF family if no circuit 
of size S can distinguish between f, (for a chosen uniformly at random) and 
a random function with advantage better than € when seeing the value of the 
function on q random inputs. 

The main contribution in this section is in presenting the following construc- 
tion of a simple computational extractor from any wPRF and demonstrating its 
practical security. 


wPRF-based computational extractor. Let F = {fa : {0,1} > {0,1} }aezo,1}9 
be a wPRF family. We define the extractor F : {0,1}" x {0,1}£ — {0,1}? as 
F(a, 8) = (fa(s), 8). 


Theorem 5. If F = {fa : {0,1} > {0,1} }ac{0,1}» is (S,9,€)-weak PRF with 
q? < €2°t!, then for k < n the extractor F defined above is a (k, S',£') strong 
(and propet) computational extractor with €! ~ e-2°—* and S's S-e’. 


Proof. See full version |DGKM11). 


T We assume m > n; if this is not the case in the given family F we can achieve it 
using standard range expansion techniques to increase m, possibly at the cost of 
somewhat strengthening the weak PRF requirement. 
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7.1 Application to Key Derivation 


A main application of a strong computational extractor in cryptography is for 
key derivation [Kra10]. In this case, the source distribution is some key material, 
derived from some statistical process or a key agreement protocol, that has some 
significant amount of min-entropy but is not uniformly random as needed to key 
cryptographic functions. Thus, we need a way to produce a cryptographic key 
(random or pseudorandom) out of this key material. This is where our compu- 
tational extractor is useful. One restriction is that if we use a wPRF whose key 
size is n we need to consider sources of key material whose length is at most 
n. In this case, we simply use the key material (without any processing, except 
maybe for padding to n bits) as the key to the wPRF and choose as the input 
to the wPRF a random value of length £. The latter is the seed of the extractor 
and is assumed that the application provides such random but public “salt” (see 
for discussions on this issue). Next we show concrete examples of the ap- 
plicability of this method when the key material is derived from a Diffie-Hellman 
value (as is common in the settings of key exchange and ElGamal encryption). 

We are given (S,q,¢)-wPRF and consider the ratio S/e as its measure of 
security (here S is a function of £ and q). Assume that the wPRF has full 
security, i.e. for a key of size n we have S/e ~ 2”. In this case, Theorem [5] 
guarantees that the extractor F (the KDF in our application) has parameters 
(S', €) such that: 


ee 2 - and S a See =H 2" ogee Ww Ee? 


As a concrete example, consider the case of a wPRF with a 256-bit key and 
security S/e = 27°° (this would apply, given current knowledge, to a PRF based 
on SHA-256, especially that we only consider attacks where the attacker cannot 
chose any inputs — it only sees the function applied to a set of random values). 
Assume now that the key, instead of being sampled uniformly at random, follows 
a distribution with min-entropy k = 160; this is the case, for example, when the 
key material is a Diffie-Hellman value computed over an elliptic curve of size 
2160 [GKRO04]. In this case we have that to distinguish the 256-bit output of 
the extractor from random with advantage e’ œ~ 2740 we must invest S” ~ 280, 
If we want to double the advantage £’ we need to invest four times more work 
(circuit size). For example, to obtain ec’ = 272° we need to work S” = 217° and 
for e' = 0.001 one needs S” = 2140, Even if we consider a less-perfect function, 
say S/e = 27°? one still gets S” = 264 for e' = 27% and S’ = 284 for e' = 0.001. 
Note that in all these cases we are outputting more pseudorandom bits (256) 
than the source entropy (160). 

In comparison, if we were applying a statistical extractor to the key material 
of min-entropy 160 to obtain a key of size 256, we could not claim any security 
at all (this is the case even if we only needed a 160-bit of output, and we would 
get security of only 2~!° if were outputting a 128-bit key). In comparing with 
statistical extractors another main advantage of our PRF-based computational 
extractor is the fact that PRFs are already available in practical cryptographic 
protocols for other uses (including key expansion as often needed in the context 
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of key derivation) and hence do not require of additional mechanisms such as a 
statistical extractor. 


Related Schemes. It is worth noting the duality between the above KDF con- 
struction and the HKDF scheme from [Krai0}. In our case, the imperfect key 
material is used to key the (weak) PRF and the seed is used as an input to 
the KDF. In HKDF these roles are reversed. This gives HKDF the advantage of 
being appropriate for input distributions of arbitrary length while in our scheme 
we are limited to the key size. On the other hand, the very non-standard use of 
a known value (the seed) as a key to a PRF in the HKDF scheme, makes the lat- 
ter much more restricted on the type of PRFs one can use (actually, the known 
analysis of HKDF is for particular PRFs, mainly HMAC, and under dedicated 
assumptions). In contrast, our scheme can use any PRF and even any wPRF. 
The recent work of Barak et al. builds a computational extractor in 
the traditional way, namely, using a statistical extractor to get a close-to-uniform 
key and using a PRG or PRF to get additional pseudorandom bits as needed. 
The novelty of that work, however, is that they show that if the output from the 
statistical extractor (implemented via a suitable hash function) is used as a key 
to a wPRF and this wPRF is applied to a random point then the best possible 
distinguishing advantage against the output of this scheme is the wPRF’s best 
distinguishing advantage plus 2~‘*-™). This is an improvement over the generic 
analysis using statistical extractors where the latter term would be 2~(*—™)/2, 
This relaxes the entropy requirement from the source and is significant in cases 
as those considered above (e.g. when generating keys from Diffie-Hellman pro- 
tocols of relative small order). Moreover, depending on the security parameters, 
the analysis from can sometimes be used, as in our case, to generate 
keys that are even larger than the available entropy. The crucial difference with 
our construction, however, is that requires the implementation of a 
statistical extractor (with its corresponding seed) in addition to the wPRF. In 
contrast, our scheme re-uses the PRF already available in most cryptographic 
implementations without requiring extra machinery (which may seem a minor 
issue considering the relative simplicity of statistical extractors but represents 
a significant barrier for adoption into standardized protocols, particularly those 
requiring hardware support). On the downside, our scheme is limited to situa- 
tions where the source of key material produces values that are no longer than 
the key of the wPRF, while have no such length restrictions. 
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Abstract. We introduce a natural cryptographic functionality called 
functional re-encryption. Informally, this functionality, for a public-key 
encryption scheme and a function F with n possible outputs, transforms 
(“re-encrypts”) an encryption of a message m under an “input public 
key” pk into an encryption of the same message m under one of the n 
“output public keys”, namely the public key indexed by F'(m). 

In many settings, one might require that the program implementing 
the functional re-encryption functionality should reveal nothing about 
both the input secret key sk as well as the function F. As an example, 
consider a user Alice who wants her email server to share her incoming 
mail with one of a set of n recipients according to an access policy spec- 
ified by her function F, but who wants to keep this access policy private 
from the server. Furthermore, in this setting, we would ideally obtain 
an even stronger guarantee: that this information remains hidden even 
when some of the n recipients may be corrupted. 

To formalize these issues, we introduce the notion of collusion-resistant 
obfuscation and define this notion with respect to average-case secure 
obfuscation (Hohenberger et al. - TCC 2007). We then provide a con- 
struction of a functional re-encryption scheme for any function F with a 
polynomial-size domain and show that it satisfies this notion of collusion- 
resistant obfuscation. We note that collusion-resistant security can be 
viewed as a special case of dependent auxiliary input security (a setting 
where virtually no positive results are known), and this notion may be 
of independent interest. 

Finally, we show that collusion-resistant obfuscation of functional re- 
encryption for a function F gives a way to obfuscate F in the sense of 
Barak et al. (CRYPTO 2001), indicating that this task is impossible for 
arbitrary (polynomial-time computable) functions F’. 


1 Introduction 


Informally, a program obfuscator is an algorithm that transforms a program into 
another, functionally equivalent program whose inner workings are “completely 
unintelligible”. Starting from the formalization of program obfuscation in the work 
of Barak, Goldreich, Impagliazzo, Rudich, Sahai, Vadhan and Yang [8], the problem 
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has received considerable attention in the cryptographic community. A method of 
obfuscating programs is an exceedingly valuable tool, both in theory and practice. 

Despite its potential for far-reaching applications, the area of program obfus- 
cation is wrought with impossibility results. The seminal work of Barak et al. [3] 
demonstrated a class of circuits which cannot be obfuscated even under a weak 
notion of obfuscation, thereby diminishing the hope of achieving general-purpose 
obfuscation. Further impossibility results for obfuscation of more natural func- 
tionalities were shown in |14/27|18/4|. Positive results for obfuscation, on the 
other hand, have been largely limited to relatively simple classes of functions 
such as point functions [7|9|21[27[14]8], proximity testing [12], encrypted per- 
mutations [I] and more recently, testing hyperplane membership [10]. 

In one of the few exceptions to this trend, Hohenberger et al. [19] showed how 
to obfuscate a complex cryptographic functionality called re-encryption [5J2}. 
Informally, a re-encryption program associated with two public keys transforms 
an encryption of a message m under the first of these keys to an encryption 
of the same message m under the second public key. Hohenberger et al. (and 
independently, [I8]) also introduce a strong definition of (average-case) secure 
obfuscation which we will use and build on in this work. Following [19], Hada [17 
showed how to securely obfuscate an encrypted signature functionality. 

Despite the slow and steady stream of positive results for obfuscation, we have 
relatively few techniques and paradigms for obfuscation. In particular, 


— The key point that enables obfuscation in both [I9] and [I7] is that they 
obfuscate functionalities that compute a function “inside a ciphertext”. For 
example, in [I9], this is the decryption function and in [I7], it is the signature 
function. Not surprisingly, it has been noted that given a fully homomorphic 
encryption scheme [22]13], the functionalities of [I9[I7] can be easily obfus- 
cated. Thus, we would like to find other paradigms for obfuscating complex 
functionalities. 

— Both re-encryption and obfuscated signatures can be thought of as access 
control mechanisms. The catch, though, is that both of them embody an “all- 
or-nothing” form of access control — for example, in the case of re-encryption, 
neither the re-encryptor nor the recipient alone can decrypt a ciphertext 
created by the initiator although together, the two of them can learn the 
entire contents of the ciphertext. We would like to consider functionalities 
that capture a finer grained delegation of access. 

— An issue that is important in both theory and practice is the presence of 
auxiliary inputs. Most positive results on obfuscation (including [917], but 
also others) do not achieve any form of security against auxiliary inputs that 
depend on the function being obfuscated. Indeed, this task seems quite hard, 
as indicated by impossibility results of (for some limited positive results 
against auxiliary inputs, see [4]). Can we achieve obfuscation against a large, 
meaningful class of auxiliary inputs? 


In this work, we make progress on the above lines of inquiry. Firstly, we relax 
(somewhat) the definition of secure obfuscation in the presence of auxiliary in- 
puts, and introduce the notion of collusion resistant obfuscation. Secondly, we 
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show how to obfuscate a natural and complex cryptographic functionality called 
functional re-encryption in a way that satisfies this notion of security. This func- 
tionality captures a finer grained delegation of access, and also protects against 
collusion between various participating parties. 


1.1 Collusion Resistant Obfuscation 


Consider the following scenario. A department would like to create a login pro- 
gram that will grant access to several users - say, Alice, Bob, and Carol, who 
have different passwords. The department would like to obfuscate this program 
and give it to the server that will run it. Now, we would like to guarantee that 
this obfuscation remains secure even if, for example, Alice were to collude with 
the server. One can view Alice’s password as being specific auxiliary information 
that an adversary obtains about the program. Note that this is a restricted form 
of auxiliary information as we do not allow an adversary to learn, say specific 
bits of Bob or Carol’s passwords. In this work, we are interested in the notion of 
average-case secure obfuscation (as defined by [19[18]) and hence in the above 
example we assume that all passwords are chosen uniformly at random. 

One can generalize the above functionality and obtain a general definition 
of collusion resistant obfuscation. We would like to obfuscate a function family 
{C)} that has the following form. Any Ck € Cy is parameterized by a set of 
“secret” keys K = {ki,ko,--- , ke} (in addition to any other parameters that the 
circuit might take) that are chosen at random from some specified distribution. 
Now, define a subset of keys represented through a set of indices T C [4]. ([4] 
denotes the set {1,2,--- ,2}.) We would like to construct an obfuscation of the 
circuit, denoted by Obf (Cx), so that Obf (Cx) is a “secure obfuscation” of Ck (in 
the sense of [I9]) even against an adversary that knows the set of keys {ki hier. 


1.2 Functional Re-encryption 


Functional re-encryption is an expressive generalization of re-encryption [5[2]. 
A functional re-encryption functionality is parameterized by a policy function 
F : D > fn] (i.e, F has domain D and has n possible outputs) chosen from 
some class of functions, an input public key pk, and n output public keys. The 
functionality receives as input a ciphertext of message m with “identity” id under 
the input public key pk] It decrypts the ciphertext using the secret key sk to 
get m and id, and then re-encrypts m under the “appropriate” output public 
key pk F(id): Following our desiderata from before, one could think of functional 
re-encryption as a form of fine-grained delegation of access. 

To motivate the functional re-encryption functionality, consider the following 
scenario: Alice wishes to have her e-mail server “route” her incoming mail to 
one of a set of n recipients. The particular recipient to which the ciphertext 
should be routed depends on both the contents of the ciphertext — essentially, 


1 This is a slight generalization of the description given earlier in the abstract where 
the function F is applied to the entire message. We choose to view the message as an 
identity on which the function F is applied, and a separate “payload” for conceptual 
cleanliness. 


Functional Re-encryption and Collusion-Resistant Obfuscation 407 


the identity id — as well as Alice’s access policy encoded by her function F. The 
e-mail server does this by “re-encrypting” the contents of the ciphertext under 
the appropriate public key. The minimal requirement from such a system is that 
the “re-encryption mechanism” hide both the message and Alice’s access polic 

—it should merely provide a means for the server to do the appropriate routing À 

One (not particularly appealing) way for Alice to do this would be to give the 
e-mail server her secret key and her access policy; this lets the server decrypt 
all incoming messages and determine where to route them. Unfortunately, this 
“solution” completely fails this minimum requirement. Ideally, Alice would like 
to “obfuscate” the trivial functional re-encryption program above and give it to 
the server. We show how to securely obfuscate functional re-encryption which, 
informally speaking, guarantees that any “attack” that the server can carry out 
given the obfuscated functional re-encryption program, could also be carried out 
given only oracle access to the functional re-encryption program (which is no 
power at all!). 

Furthermore, in reality we could reasonably expect the server to collude with 
some of the recipients to learn additional information about messages or about 
Alice’s access policy function F. Clearly, collusion helps the server — he can use a 
recipient’s decryption key together with the re-encryption program to learn the 
output of F on certain inputs. If we consider the auxiliary input to be the secret 
keys of the colluding recipients, then our strong notion of collusion-resistant 
secure obfuscation guarantees that this is the only information that the server 
could possibly learn by colluding. 

Selectively delegating access is also the central theme of a recently introduced 
notion of predicate encryption (which can be viewed as attribute based 
encryption in which ciphertexts hide their attributes). In fact, (predicate-hiding, 
public key) predicate encryption schemes can potentially be used to solve Alice’s 
dilemma. This is done by completely ignoring the email server and giving each 
of the recipients a “little secret key” that is just powerful enough to decrypt the 
appropriate ciphertexts (dictated by the access policy). Aside from the fact that 
there are no known public-key predicate hiding encryption schemes (nor even 
good definitions of them), this solution has two drawbacks — first, there is no 
way to revoke access from a recipient other than by having Alice choose a fresh 
key for herself (which could be quite expensive). Second, this solution requires 
all recipients to be aware of the existence of an access policy, while the solution 
based on functional re-encryption is completely invisible to the recipients — they 
continue using their already registered public keys, and they do not even have 
to know of the existence of the functional re-encryption mechanism. 


1.3 Overview of Results and Techniques 


Collusion resistant obfuscation. We define the notion of collusion resistant obfus- 
cation which guarantees security against a natural form of auxiliary inputs. This 


2 Of course, since the e-mail server does not know who the recipient is, it either sends 
the resulting ciphertext to all the recipients or publishes it on a bulletin board from 
which the intended recipient can then access it. 
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notion of auxiliary input security might be realizable (without random oracles) 
for many common cryptographic tasks. 


Functional Re-encryption. We show, informally: 


Theorem 1 (Informal). Under the Symmetric External Diffie-Hellman as- 
sumption there exists an encryption scheme such that for any function F : D > R 
with polynomial-sized domain D, there is a collusion-resistant average-case secure 
obfuscation of the functional re-encryption program w.r.t. F. The size of the input 
ciphertext in the encryption scheme is O(|D] - poly(A)), and the size of the output 
ciphertext is O(poly(A)) (i.e., independent of the domain and the range of F). 


We now present the ideas behind our construction at a very high level. One 
can think of a functional re-encryption program as a program that must achieve 
two goals - a) it must “hide” the policy function F, and b) it must also “hide” 
the input secret key (that it uses to decrypt the input ciphertext). These two 
goals must simultaneously be achieved while maintaining the right functionality. 
Informally, the main innovation in our work is a technique to hide the policy 
function - this combined with techniques from allows us to achieve both 
goals simultaneously. We shall now describe this first technique in more detail. 
Let G, H, Gr be groups such that there is a bilinear map e : G x H > Gr. Let 
Q1,°°: , Qq €E he be vectors that denote elements in the domain D of function F 
and let @1,--- ,@n E Zq denote elements in the range R of F. Now consider a 
function OF that maps elements in G to elements in Gr in the following way. 
OF is parameterized by random generators g € G and h € H. On input g%, OF 
outputs e(g,h)®*®. We shall now informally sketch how to publish a program 
that achieves the functionality provided by OF, but at the same time hides F. 
The program computes a vector a € zg such that the inner product (a;,a) = 
ûârq() for all i. Note that this is indeed possible as @ is a solution to a system of d 
equations in d variables. The program description simply contains h®. (This can 
be computed given only hêr and a; for all i, so we do not actually need the 
recipient secret keys âp.) On input g®’, the program computes and outputs 


Mi- e(g% , h%) = e(g, h)§%™ = e(g,h)êF® , which is the output as desired. 

Unfortunately, this solution does not completely hide the function. Note that 
if F(1) = F(2) (say), then an adversary can learn this by simply running the 
above program and checking if the output is the same on both the inputs. To 
get around this problem, we modify the program in the following way. The 
program picks random w;, for all 7, and computes two vectors a, 3 € zg such 
that the inner product (a;, a) =wiG@p(;) and (a;, 3) =w;, for all i (in our actual 
solution we require the R.H.S of the second equation to be w; — 1 instead of 
wi, but we will ignore that for now). The program description now contains 
h®,h®. On input g®%, the program computes and outputs Ma e(g% ho) = 
e(g,h)’@"@, as well as Th e(g% ,h®i) = e(g,h)”*. Now, on two different 
inputs (of F) that have the same output, the above program outputs elements 
of the form (e(g, h)**,e(g,h)”) and (e(g, h)¥*,e(g, h)”), for random a,x and y. 
However, these tuples are indistinguishable from random, even given e(g, h) and 
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e(g,h)*, (by DDH) and hence an adversary cannot tell if F(1) = F(2). This 
construction now ensures that F is completely hidden. 

Now, note that if we let {g%},1 < i < d be the input public key and e(g, h)” 
be the output public key, then one can potentially use the above construction to 
build a scheme that converts an encryption of message m under g™ to one under 
e(g,h)?"™. This is precisely what we do. Our encryption schemes are ElGamal- 
like, the input encryption key contains a set of vectors g®,--- ,g®4, and an input 
encryption of message m with identity 7 uses the key g™. Finally, in order to obtain a 
secure obfuscation, we apply techniques from to re-randomize the ciphertexts. 


Obfuscating Functional Re-encryption for Arbitrary Policy Functions? A natu- 
ral question raised by our result is whether it is possible to achieve collusion- 
resistant obfuscation of functional re-encryption for arbitrary (polynomial-time 
computable) policy functions F (in particular, functions F with domains of 
super-polynomial size). We show that this goal is impossible to achieve. In par- 
ticular, we show that a collusion-resistant obfuscation with respect to a policy 
function F already contains within it a [B]-style obfuscation (a so-called “pred- 
icate obfuscation” ) of the policy function F. In some sense, this is not entirely 
surprising, and corresponds to the intuition that a collusion-resistant. obfusca- 
tion of functional re-encryption allows computation of the function F El and yet 
hides all internal details of F except the input-output behavior. Together with 
the impossibility result of [8] for obfuscating general (families of) functions, this 
shows that there are classes of (polynomial-time computable) policy functions for 
which it is impossible to construct collusion-resistant secure obfuscation of func- 
tional re-encryption. See the full version of the paper [II] for a formal statement 
and proof of this result. The next question to ask is whether there is any non- 
trivial policy function (with a domain of super-polynomial size) for which this 
goal can be achieved. We informally argue that this may require some new inno- 
vation on the question of constructing public-key predicate encryption schemes 
which satisfy a strong security notion called predicate-hiding. Predicate encryp- 
tion schemes were defined by Katz, Sahai and Waters [20], following [2305] (in 
particular, the predicate-hiding property was defined in the work of Shi, Shen 
and Waters [25]). Constructions of predicate encryption schemes (even ones that 
do not achieve predicate-hiding) are known only for simple classes of functions 
such as inner products [20]. Moreover, in the public-key setting, we do not know 
how to achieve (any reasonable definition of) predicate-hiding, even for simple 
functions. Since collusion-resistant obfuscation of functional re-encryption seems 
to have the same flavor in functionality as predicate-hiding public-key predicate 
encryption, advancements in the class of policy functions that these primitives 
can handle seem to be correlated. 


3 A collusion-resistant obfuscation of functional re-encryption allows computation of 
the function F since given an output secret key sk; and the re-encryption program, 
one can test if F(id) = i for any id in the domain of F. Simply encrypt a random 
message with identity id, run it through the re-encryption program and decrypt it 
using sk;. If this returns the same message that was encrypted, then conclude that 
F(id) = i. 
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2 Collusion Resistant Secure Obfuscation 


2.1 Average-Case Secure Obfuscation 


Throughout this paper, we will implicitly assume that the adversary (as well as 
simulator) can obtain arbitrary polynomial-size independent auxiliary input z. 
We remark that our construction is secure even against the presence of such aux- 
iliary information. We now recall the notion of average-case secure obfuscation 
introduced in [19] below. 


Definition 1. An efficient algorithm Obf that takes as input a (probabilistic) 
circuit C from the family {C,} and outputs a new (probabilistic) circuit, is an 
average-case secure obfuscator, if it satisfies the following properties: 
- Preserving functionality: With overwhelming probability Obf (C) behaves “al- 
most identically” to C on all inputs. Formally, there exists a negligible func- 
tion neg(A), such that for any input length A and any C € Cx: 


[Aa € {0,1} : Œ — Obf(C); SD(C’ (x), C(x)) > neg(A)] < neg(A) 


y 
coins of Obf 


where SD(X, VY) denotes the statistical distance between two distributions X 
and y. 

- Polynomial slowdown: There exists a polynomial p(A) such that for suffi- 
ciently large input lengths A, for any C € Cy, the obfuscator Obf only en- 
larges C by a factor of p. That is, |0bf(C)| < p(|C}). 

- Average-case Virtual Black-Boxness: There exists an efficient simulator S 
and a negligible function neg(A), such that for every efficient distinguisher 
D, and for every input length à: 


|PriC + Cy : D (ob£(C)) = 1] — Pr[C — Cy : DF (SF (1>)) = 1]| < neg(A) 


The probability is over the selection of a random circuit C from Cy, and the 
coins of the distinguisher, the simulator, the oracle, and the obfuscator. 


2.2 Average-Case Secure Obfuscation with Collusion 


Consider the case where we would like to obfuscate a function family {C)} that 
has the following particular form. Any Cx € C) is parameterized by a set of 
“secret” keys K = {ki,ko,--- , ke} (in addition to any other parameters that the 
circuit might take) that are chosen at random from some specified distribution. 
Now, define a (non-adaptively chosen) subset of keys represented through a set of 
indices 7 C [¢], where [¢] denotes the set {1,2,--- , 2}. We would like to construct 
an obfuscation of the circuit, denoted by Obf (Cx), so that Obf (Cx) is a “secure 
obfuscation” of Cc (in the sense of [19]) even against an adversary that knows 
the set of keys {ki hier. 

We accomplish this using a definition that is similar in spirit to the notion of 
obfuscation against dependent auziliary inputs [I4]. More precisely, in addition 


t This is the definition in [I9] but with a dummy adversary. The authors of that paper 
note that this is equivalent to the definition they give. 
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to their usual inputs and oracles, we give both the adversary and the simulator 
access to a (non-adaptively chosen) subset {ki};ier C K of the keys. This can be 
seen as auxiliary information about the circuit Ck + Ca. The formal definition 
of collusion-resistant secure obfuscation is as follows. 


Definition 2. An efficient algorithm Obf that takes as input a (probabilistic) 
circuit and outputs a new (probabilistic) circuit, is a collusion-resistant (average- 
case) secure obfuscator for the family {C)} if it satisfies the following properties: 


- “Preserving functionality” and “Polynomial Slowdown”, as in Definition O} 

- Average-case Virtual Black-Boxness against Collusion: There exists an effi- 
cient simulator S, and a negligible function neg(A), such that for every input 
length A, every efficient distinguisher D, and any subset T C [4]: 


Prie Cy : DEK (ObE(C'c), {ki}ieT) =i- 
Pr[Ck + Cy : DOK (SOK (1, {ki hier), {ki}ier) = 1]] < neg(A) 


The probability is over the selection of a random circuit Cx from Cy, and 
the coins of the distinguisher, the simulator, the oracle, and the obfuscator. 


Remarks on the Definition. An even stronger attack model allows the adversary 
to obtain an obfuscation of a circuit Cy where some of the keys in {k;}ie7 are 
adversarially chosen. Furthermore, one could allow the adversary to select the 
set 7 adaptively, after seeing the public keys and/or the obfuscated program. 
We postpone a full treatment of these issues to future work. 


2.3 Securely Obfuscating Functional Re-encryption 


We would like to obtain a collusion-resistant average-case obfuscator for the func- 
tional re-encryption functionality. A Functional Re-encryption (FR) functionality 
associated to function F : D + R, input public/secret key pair (pk, sk), and out- 
put public keys Pki, kei pki is a functionality that takes as input a ciphertext 


c = I-Enc(pk, id, m) and re-encrypts m under the output public key Pk Fia): More 
precisely, for a given function F : D —> R, we are interested in the family of circuits 


FRED R = {FRA FD R}A>0 where each circuit Cox sk pk pk, € FRA, FDR isa 
ied ySK,;PK],---5 R 


probabilistic circuit indexed by a key pair (pk, sk) + I-Gen(1*), and public keys 
(pk;,*) = O-Gen(1*), and works as follows: 

O ikse fanek a on input c : _ 

Computes (id, m) + I-Dec(sk, c), and outputs €< O-Enc(pk pig), M). 

If I-Dec(sk, c) returns L then outputs random elements in the format of €. 


< ~ ona cial in keys: 
C sige peok a specia put ys 


~ 


Outputs pk, pk,,... pkg). 


5 Without loss of generality, and for simplicity of notation, throughout the paper we 
will often assume that the domain D = {1,2,...,d} and the range R = {1,2,...,n}. 
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Now, for a class of functions, F, we say that a re-encryption program securely 
obfuscates re-encryption for F, if there exists a simulator S that satisfies the 
collusion resistant obfuscation property w.r.t. FRpp.r forall F:D>REF. 

In other words, all public keys in the circuit Ck skk... g are considered 

PK,SK, PK} y PRIR] 
public knowledge; the only pieces of information we are interested in protecting 
are the input secret key sk and the function F. Also, note that we are interested 
in guaranteeing security for arbitrarily chosen F, not F chosen at random. 

The set of secret_keys that will parameterize a functional re-encryption func- 
tionality is K = {sk,,--- ,skj)}. The definition of collusion-resistant average- 
case secure obfuscation guarantees security against an adversary who not only 
knows the re-encryption program, but also has access to a subset {sk;}ie7 C K 
of the output secret keys. This scenario endows the adversary with considerable 
power and knowledge. For instance, 


— The adversary will inevitably be able to decrypt all ciphertexts c = 
I-Enc(pk, id,m), where F(id) € T, simply by using the re-encryption pro- 
gram to convert the ciphertext c into an encryption of m under the output 
public key Pk pid); and then decrypting it using sk (id): 

— Moreover, the power to selectively decrypt a subset of the input ciphertexts 
gives the adversary information about the access policy function F itself. 
For instance, the adversary can determine if F'(id) = i whenever i € T. 

— Finally, we remark that the definition of obfuscation for functional re- 
encryption by itself does not guarantee the semantic security of the input 
and output encryption schemes. We define these separately and prove the 
security of the encryption schemes (even in the presence of the re-encryption 
program). In more detail, we will require the semantic security of the input 
encryption scheme, on messages encrypted with an identity id*, whenever 
F(id*) ¢ T, even when the adversary is given access to a re-encryption or- 
acle. We will similarly require that the input ciphertext hides the identity 
id*, under which the message is encrypted. The security of the output en- 
cryption scheme will be that of standard semantic security. Since we wish 
to hide everything about the function F, we will also require the output 
encryption scheme to be key private; i.e., an encryption under public key 
pk, will be indistinguishable from an encryption under public key pk,. For 
formal definitions and proofs of these properties, see the full version [IJ]. 


3 Preliminaries 


We let À be the security parameter throughout this paper. By neg(X) we denote 
some negligible function, namely a function u such that for all c > 0 and all 
sufficiently large A, w(A) < 1/A°. For two distributions Dı and D2, Di x Do 
means that they are computationally indistinguishable (to be precise, this state- 
ment holds for ensembles of distributions). 

We let [£] denote the set {1,--- , 4}. We denote vectors by bold-face letters, 
e.g., a. Let G be a group of prime order q. For a vector a = (a1, d@2,+++ ,a¢) € zé 
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and group element g € G, we write g? to mean the vector (g%,g%,--- ,g%). 
For two vectors a and b where a and b are either both in zé or both in Gf, 
we write ab to denote their component-wise product and a/b to denote their 
component-wise division. In case b € Zi, we let aè denote their component-wise 
exponentiation. For a vector a and scalar z, ca = ab,a/x = a/b, and a? = aè, 


where b = (a,2,--- ,x) of dimension £. 


Assumptions. We assume the existence of families of groups {GO)} 50, 
{H™} <9 and 1G ho with prime order q = q(A), endowed with a bilin- 
ear map e) : GA) x HA — GA, When clear from the context, we omit the 
superscript that refers to the security parameter from all these quantities. The 
mapping is efficiently computable, and is bilinear — namely, for any generators 
g € G and h € H, and a,b € Zq, e(g*, h?) = e(g, h)®®. We also require the bilin- 
ear map to be non-degenerate, in the sense that if g € G, h € H generate G and 
H respectively, then e(g, h) Æ 1. 

We assume the Symmetric External Diffie-Hellman Assumption (SXDH)), 
which says that the decisional Diffie-Hellman (DDH) problem is hard in both 
of the groups G and H. That is, when (q, G, H, Gr, e) + BilinSetup(1*);g + 
G; a,b,c + Zq, the following two ensembles are indistinguishable: 


{(q,G,H, Gr, e, g, g°, 9°, g™)} = {(4,G, H, Gr, e, g, 9°, 9°, g°)} 


and a similar statement when g € G is replaced with h € H. In contrast, the 
assumption that DDH is hard in one of the two groups G or H is simply called 
the external Diffie-Hellman assumption (XDH). These assumptions were first 
proposed and used in various works, including [26]6]24{16]. In this work, we use 
the SXDH assumption. 


4 Collusion-Resistant Functional Re-encryption 


We are now ready to present our construction of a functional re-encryption 
scheme from the symmetric external Diffie-Hellman (SXDH) assumption. We 
first construct our basic encryption schemes in Section [4.1] In Section [4.2] we 
describe a program that implements the functional re-encryption scheme. Finally, 
in Section [4.3] we prove that our functional re-encryption program satisfies the 
notion of collusion-resistant average-case secure obfuscation. 


4.1 Construction of the Encryption Schemes 


A functional re-encryption scheme transforms a ciphertext under an input public 
key into a ciphertext of the same message under one of many output public keys. 
In our construction, the input and the output ciphertexts have different shapes 
— namely, the input ciphertext lives in the “source group” G whereas the output 
ciphertext lives in the “target group” Gr. We now proceed to describe our 
input and output encryption schemes which are both variants of the ElGamal 
encryption scheme. 
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Parameters. The public parameters for both the input and the output encryption 
scheme consist of the description of three groups G, H and Gr of prime order 
q = @(A), with a bilinear map e : G x H > Gr. Also included in the public 
parameters are two generators —- g € G and h € H. Let M = M(A) CG 
denote the message space of both the input and output encryption schemes. We 
assume that |M] is polynomial in à. The construction of our output encryption 
scheme requires this to be the case; however, one can encrypt longer messages by 
breaking the message into smaller blocks and encrypting the blocks separately. 
The Input Encryption Scheme. We first construct the input encryption scheme, 
which is parameterized by d = d(A) which is an upper bound on the size of 
the domain of the policy function that we intend to support. We will also use 
a NIZK proof system; we note that [I6] provides an efficient scheme for the 
type of statements we use, which is perfectly sound and computationally zero- 
knowledge based on SXDH. We remark that, while the semantic security of 
the input encryption scheme does not require this NIZK proof, the obfuscation 
guarantee provided by our construction relies on it; if, for example the adversary 
were to provide an invalid ciphertext as input to the re-encryption program (e.g. 
by combining 2 valid ciphertexts with different i’s), the program might output 
some group elements that are distinguishable from random to an adversary that 
possesses some of the recipient secret keys. 
The input encryption scheme is as follows: 


1. I-Gen(1*,1¢): Pick random vectors a1,--- ,@q from zg that are linearly 
independent. We also generate crs, a common reference string (abbreviated 
CRS) for the NIZK proof system. Output pk = (crs,g,g™,--- ,g%*), and 
sk = (a@1,:-: ,@a). We remark that the public key pk can be viewed as being 
made up of d public keys pk; = (g, g%) of a simpler scheme. 

2. I-Enc(pk,i € [d], m): To encrypt a message m € M, with “identity” i € [d], 
choose random exponents r and r’ from Z,, and compute: 

(a) C = 9"; D=g"m, and 

(b) Œ = g"; D' = g” 

(c) 7, a proof that these values are correctly formed, i.e. that they correspond 

to one of the vectors g® contained in the public key. 

Output the ciphertext (E, E’, r) where E = (C, D) and E’ = (C’,D’). 
(Looking ahead, we remark that E looks like an encryption of message m 
under pk;, while E’ looks like an encryption of 1g under pk;. E’ is primarily 
used by the re-encryption program for input re-randomization, and is not 
required if the encryption scheme is used stand-alone without the functional 
re-encryption program.) 

3. I-Dec(sk, (E, E’)): If any of the components of the ciphertext E’ is 1g or if the 
proof m does not verify, output 1 Ignore E’, m subsequently, and parse E as 
(C, D). Check that for some i € [d] and m € M, D-(C'/%)-! = (m,--+ ,m). 
If yes, output (i,m). Otherwise output L. 

° This “sanity check” is to ensure the correctness of the re-encryption program. Note 


that if (E, E’) is honestly generated, this event happens only with negligible proba- 
bility. 
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The Output Encryption scheme. We now describe the output encryption scheme. 


1. O-Gen(1>): Pick â + Z,. Let pk = hê and sk = â. 
2. 0-Enc(pk, m): To encrypt a message m E€ M CG, 
— Choose random number r + Zg. 
— Compute Ÿ = (h*)" and W = h". 
— Output the ciphertext as [F, G] := [e(9, Y), e(g, W) - e(m, h)]. 
3. 0-Dec (sk = =a (F, G)) : The decryption algorithm does the following: 
— oe Q= 2 Poa, K 
— For each m € M, test if e(m,h) = Q. If so, output m and halt. (Note 
that if e(m,h) are precomputed for all m € M, then this step can be 
implemented with a table lookup.) 


4.2 Obfuscation for Functional Re-encryption 


We now describe our scheme for securely obfuscating the functional re-encryption 
functionality for the input and output encryption schemes described above. 


The Functional Re-encryption Key. The obfuscator gets an input secret key sk, 
the n output public keys pk,, and the description of a function F : [d] > [n]. It 
outputs a functional re-encryption key which is a description of a program that 
takes as input a ciphertext of message m € M and identity i € [d] under public 
key pk, and outputs a ciphertext of m under Pk pri). 
The obfuscator does the following: 
1. Pick w; + Zq for all i € [d] uniformly at random. 
2. Solve for a = (a1,...,@a) and 8 = (61,..., Ba) such that for all i € [d]: 


(aia) = wi: Gre) and (ai,8) = wi-1 


The re-encryption key consists of the tuple (A,B) where A = h® and B = Af. 
We remark that computing the re-encryption key does not require knowledge 
of the pea secret keys. To compute h®, one can take the output public keys 
h™,--- ,h®4, and with the knowledge of the input secret keys aoe - ,aa} and 
ee wales W1,- . ., Wa, one can solve the set of equations hit = = perder), 
for all i € [d], to obtain h®. hÊ can be computed in a similar manner. 


The Functional Re-encryption Program. Given the functional re-encryption key 
(A,B) and an input ciphertext (E, E’) where E = (C, D) and E’ = (C’,D’), 
the functional re-encryption program performs the following steps: 


1. Sanity Check: If any of the components of the input ciphertext E’ is lg or 
if the proof m does not verify, output (F, ey for random F, G € Gr. The 
sanity check is to ensure that the next step — namely, input re-randomization 
—randomizes the ciphertext E. 

2. Input Re-Randomization: Pick a random exponent t + Z, and compute 
C=C(C’) and D = D(D’)'. 

Note that the random exponent t is used to re-randomize the encryption of 
1g, and this re-randomized encryption of 1g is multiplied with the encryption 
of m to get a re-randomized encryption of m. 
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3. The main Re-encryption step: Write C:= (Ce os sa A := (Aj,...,Aa) 
and B := (B,,..., Ba). Compute 


d d 
“ll e(C;, Ay) and G =|] e(C;, B;) - e(D,h) 


j=l 
Output the ciphertext (F, Ĝ). 


Preserving functionality. Let the input ciphertext be (C, D, C’, D’,7). Given 
that m verifies, we know these values will be of the form C = g™™, D = g’m and 
C' = g% D = g". (If m does not verify, then both the functionality and the 
above program will output random group elements.) Let the re-encryption key 
be (A,B) where A = h® and B = AP. 


— First, the input re-randomization step computes C = c(t = 
gttr aiga and D = D(D’)' = gtm = gim, where we defined 
A j 
r=r+tr’. pa 

— Second, the main re-encryption step computes F = Ii- e(C;, Aj) = 


e(g, here) = e(g,h)4ro and 


j=l 
= e(g, hb) . e(g°m, h) = e(g, bh) - eg", h) - e(m, h) 
= e(g, h) ™ -e(m,h) 


— Now the ciphertext looks like F = e(g, herr), G= e(g, hP) -e(m, h), where 
p = fu; is uniformly random in Z4, even given all the randomness in the 
input ciphertext. The claim about p being uniformly random crucially relies 
on the “sanity check” step in the re-encryption program (in particular, since 
r' £0). 

Thus, the final ciphertext is distributed exactly like the output of 
0-Enc(pk pqi), mM). 


Semantic security of encryption schemes. We show that the input and output 
encryption schemes are semantically secure (in particular, the input scheme hides 
both the message and the “identity” , and the output scheme is also key-private) 
under the DDH assumption over different groups, even given the re-encryption 
program. We present a detailed proof in the full version [I]. 


Remark. Note that if d = n = 1, then our construction (with the removal of 
certain now unnecessary parts, such as the NIZK proof) reduces to something 
very similar to that of [19]. Also, note that if the function F were to have 
larger (super-polynomial) domain, then our solution would satisfy the property 
of polynomial slowdown only if F were represented as a truth table. If F has 
large domain but a concise representation, then this property no longer holds. 
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4.3 Proof of Collusion-Resistant Secure Obfuscation 


We show that our construction is a collusion-resistant average-case secure obfus- 
cator for the functional re-encryption functionality. In order to satisfy collusion- 
resistance, the encryption as well as the obfuscation scheme have to be modified 
somewhat. The modifications do not affect the functionality or the security of the 
scheme, and are merely artifacts that seem necessary to show that our functional 
re-encryption scheme meets the rigorous demands of being a secure obfuscation. 


A necessary modification to the encryption and obfuscation schemes. Consider 
the case where a corrupt recipient that holds secret key sk; colludes with the 
re-encryption program. Now, essentially, this recipient has access to a program 
that selectively decrypts input ciphertexts that are encrypted with an identity 
i such that F(i) = j. However, the simulator only has oracle access to such a 
program and must yet produce a “fake” re-encryption program, that on input a 
ciphertext of message m with identity id, outputs a correct ciphertext of m under 
pkj. Hence, in order to put the simulator on an equal footing with the adversary 
we need to give the simulator the power to produce an explicit program which 
can selectively decrypt input ciphertexts. One way to do this is to cheat and give 
the simulator the vector a;, for all i such that F(i) = j, in our construction, 
which we will refer to as sk;. (Note that sk; is a secret key that allows for the 
selective decryption of ciphertexts with identity i, but not any other ciphertext.). 
For ease of exposition, we shall for now assume that the simulator obtains sk; for 
all i such that F(i) € T. However, we would not like to resort to this cheat — we 
show in the full version how this can be avoided. In other words, we first show 
the security of our scheme in the modified model where the simulator obtains 
the sk; values for all i such that F (i) € T. Next (in the full version [I1]), we show 
that if the scheme is secure in this model, then it can be easily transformed into 
a scheme that is secure in the standard model where the simulator, like the real 
world adversary, only gets sk; values for j € 7. We will now focus on proving the 
former statement. Towards showing that our obfuscation satisfies the collusion- 
resistant secure obfuscation definition in the model where the simulator obtains 
the sk; values for all i such that F (i) € T, we first construct a simulator. 


Simulator. Let C + FR) ra,n be a functional re-encryption circuit for the func- 
tion F : [d] > [n], parameterized by the input keys (pk, sk) and the output keys 
(pkj, skj) for all j € [n]. Let T C [n] be a set of corrupted receivers. We construct 


a simulator S that gets as input the secret keys Skj of all the corrupted receivers 
(where j € T) and the secret keys sk; such that F (i) € T, and has oracle access 
to the functionality C. 

First, consider the case where none of the receivers is corrupted. Then, the 
simulator works as follows. Recall that the obfuscated re-encryption program 
consists of the tuple (h,h®,h?) where and @ and 8 are solutions to some linear 
equations involving the input and output secret keys. The simulator, instead, 
simply picks a and 8 uniformly at random (with no relation to the input or the 
output keys). It then runs the adversary on this “junk functional re-encryption 
program” (along with the secret keys of the corrupted receivers). Under the 


418 N. Chandran, M. Chase, and V. Vaikuntanathan 


SXDH assumption, we manage to show that this is indistinguishable from the 
obfuscated program that the adversary expects to get (even if the adversary is 
also given oracle access to the real re-encryption circuit C). 

If some of the receivers are corrupted, the simulator cannot choose œ and @ at 
random any more. Indeed, since the distinguisher has the corrupted output keys, 
it can check if the a and 8 (in the exponent) satisfy the equations involving the 
corrupted keys, namely {sk; }jer. Thus, the simulator has to choose œ and 8 as 
uniformly random solutions to a set of equations that involve the corrupted keys. 
It turns out that this can be done efficiently since the simulator knows the keys 
of the corrupted receivers as well. J 

Without further ado, let us present the simulator S°(1*,7, {sk;}ie7, 
{skj}jer-1(7)) that works as follows: 


1. Query the oracle C on input the string “keys” to get all the public keys, 
including the input public key pk = (g, g™!,--- ,g%); and the output public 
keys pk, = (h,h®),--- ,pk,, = (h, hê”). 

2. Sample random w1, ..., wa from Zq. Sample random a, @ from zg such that 
Vi s.t. F(i) ET: (ai, a) = Wiûâr(i) and (ai, B8) = wi—1 
Note that this can be done efficiently using the knowledge of the vectors a; 
that we obtained in {skj }jer-1(7), as well as the @p (;) values which are part 
of the corrupted secret keys. Compute A = h®%, and B = hê . Output the 
tuple (A, B) as the re-encryption key. 


We now show that the output of the simulator described above is indistinguish- 
able from an obfuscation of the re-encryption functionality (given in Section[4.2), 
even to a distinguisher that has the corrupted receivers’ secret keys and ora- 
cle access to the re-encryption functionality. This proves that the obfuscation 
scheme we constructed in section is a collusion-resistant average-case secure 
obfuscation satisfying Definition 2] More formally, we show: 


Theorem 2. Under SXDH, for any ppt distinguisher D and corrupt set T C |n], 


D° love). 7. {skj}jeT, {Skj}jer-1(7)| © 
DE [SE (1>, T, {sky }jer, {skj}jer-1(7))) {skj bier. {ski }jer-1(7) 
for obfuscator Obf, where C + FR) ran is a uniformly random re-encryption 
circuit parameterized by (pk,sk) < I-Gen(1*) and (pk;,sk;) — O-Gen(1>). 
From the above theorem, our main theorem (which we stated informally as 
Theorem[]) follows after making the necessary modifications to the construction 


outlined earlier. We now describe a sketch of the proof of Theorem [2] For the 
formal proof, see the full version [IJ]. 


Proof. (sketch.) At a high level, the proof will go through the following steps: 


— Step 1: For simplicity, let us first consider the case when there is no collusion 
— that is, neither the distinguisher nor the simulator has access to any of the 
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output secret keys. Later, we will point out the necessary modifications to 
achieve collusion-resistance. 

We first show that the re-encryption key is indistinguishable from random 
group elements to any distinguisher D who is given the public keys for the 
input and output encryption scheme (but no oracle access). In other words, 
we will show that constructing a re-encryption key (A,B) where A = h® 
and B = hÊ with a, B being solutions to the equations 


(ai, a) = Wiûr (i) and (ai B) =wi—1  forallie[d] (1) 


is indistinguishable from constructing a re-encryption key with uniformly 
random @ and 8. This follows from two ideas — first, under the DDH as- 
sumption in group H, it is hard to distinguish between (h, h®, h8) where a 
and @ are solutions to Equations [I| from the case where they are solutions 
to the same set of equations with the right-hand sides replaced by uniformly 
random elements in Z}. Ñ Next, we note that choosing a, as a solution 
to a set of equations with uniformly random right-hand side is equivalent 
to simply choosing random a, 3. This completes the first step - in the full 
version [II] we show that this generalizes to the case where 7 is non-empty, 
and the simulator’s a, are chosen as a random solution to the resulting 
underconstrained set of equations. 

— Step 2: Next, we will provide our distinguisher D with oracle access to 
a random oracle that simply returns random group elements of the same 
format as the output ciphertext of the re-encryption program. (The only 
exception is that, when it receives a ciphertext encrypted under id such that 
F(id) € T, it honestly performs the re-encryption.) We then show that the 
re-encryption key is indistinguishable from random group elements to this 
distinguisher DFO as well. 

This follows from Step 1 fairly easily once we note that the distinguisher in 
Step 1 could easily simulate this random oracle itself. 

— Step 3: Next, we will provide our distinguisher D with oracle access to either 
the re-encryption oracle or the random oracle, and argue that D will not be 
able to determine which oracle it is given, even if it is also given the real 
re-encryption key. 

The main intuition behind this proof is that, based on SXDH, we can show 
that honestly generated outputs ciphertexts are indistinguishable from ran- 
dom tuples. This is fairly easy to see: consider public key hê, and the fol- 
lowing tuple [e(g, h®), e(g,h”) - e(m, h)| for random â,r € Z4. If w = ar, 
this is a valid encryption of m, if w is a random element of Zq, then this is 
a random tuple from Gr x Gr. 

A fairly straightforward hybrid argument then shows that a real encryption 
oracle for public keys pk,,..., pk,, is indistinguishable from a random oracle 


T Note that the right-hand sides of Equation [I]are not random as such — for example, 
consider the case where F(1) = F(2) = 1. Then, the right-hand sides of the four 
equations corresponding to i = 1 and i = 2 are wid1, wi — 1, w241, w2 — 1, which are 
clearly correlated. 
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which only produces valid ciphertexts for pk; with i € T (even when the 
distinguisher is given sk; for i € T). 

Now, we note that we can generate a real re-encryption key and perfectly 
simulate either the real re-encryption oracle or the random re-encryption 
oracle given only pk, oe Pkn, and either the encryption oracle or the random 
oracle described above. We conclude that the real re-encryption oracle and 
random re-encryption oracle are indistinguishable even given the real re- 
encryption key (and sk; for i € T). 

— Step 4: Finally, we will again provide our distinguisher D with oracle access 
to either the re-encryption oracle or the random oracle and argue that it will 
not be able to determine which oracle it is given, this time when given the 
simulated re-encryption key instead. 

Again, this follows from Step 3, when we note that the distinguisher in Step 
3 could easily ignore the re-encryption key it is given and instead run the 
simulator to generate a simulated one. 


We have argued that the distinguisher has the same behavior given the real re- 
encryption key and real re-encryption oracle or the real re-encryption key and 
random oracle (Step 3), that it has the same behavior given the real re-encryption 
key and random oracle or the simulated re-encryption key and random oracle 
(Step 2), and that it has the same behavior given the simulated re-encryption key 
and random oracle or the simulated re-encryption key and real re-encryption ora- 
cle (Step 4). Putting everything together, we conclude that the real re-encryption 
key and simulated re-encryption key are indistinguishable, even given access to 
the real re-encryption oracle. Thus, we obtain the proof of Theorem P} 


Acknowledgements. We wish to thank Markulf Kohlweiss for suggesting the 
use of the SXDH assumption which simplified our construction. 
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Abstract. The wide variety of small, computationally weak devices, 
and the growing number of computationally intensive tasks makes it 
appealing to delegate computation to data centers. However, outsourcing 
computation is useful only when the returned result can be trusted, which 
makes verifiable computation (VC) a must for such scenarios. 

In this work we extend the definition of verifiable computation in 
two important directions: public delegation and public verifiability, which 
have important applications in many practical delegation scenarios. Yet, 
existing VC constructions based on standard cryptographic assumptions 
fail to achieve these properties. 

As the primary contribution of our work, we establish an important 
(and somewhat surprising) connection between verifiable computation 
and attribute-based encryption (ABE), a primitive that has been widely 
studied. Namely, we show how to construct a VC scheme with public del- 
egation and public verifiability from any ABE scheme. The VC scheme 
verifies any function in the class of functions covered by the permissible 
ABE policies (currently Boolean formulas). This scheme enjoys a very 
efficient verification algorithm that depends only on the output size. Ef- 
ficient delegation, however, requires the ABE encryption algorithm to 
be cheaper than the original function computation. Strengthening this 
connection, we show a construction of a multi-function verifiable com- 
putation scheme from an ABE scheme with outsourced decryption, a 
primitive defined recently by Green, Hohenberger and Waters (USENIX 
Security 2011). A multi-function VC scheme allows the verifiable evalu- 
ation of multiple functions on the same preprocessed input. 
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1 Introduction 


In the modern age of cloud computing and smartphones, asymmetry in computing 
power seems to be the norm. Computationally weak devices such as smartphones 
gather information, and when they need to store the voluminous data they col- 
lect or perform expensive computations on their data, they outsource the storage 
and computation to a large and powerful server (a “cloud”, in modern parlance). 
Typically, the clients have a pay-per-use arrangement with the cloud, where the 
cloud charges the client proportional to the “effort” involved in the computation. 

One of the main security issues that arises in this setting is — how can the 
clients trust that the cloud performed the computation correctly? After all, the 
cloud has the financial incentive to run (occasionally, perhaps) an extremely fast 
but incorrect computation, freeing up valuable compute time for other transac- 
tions. Is there a way to verifiably outsource computations, where the client can, 
without much computational effort, check the correctness of the results provided 
by the cloud? Furthermore, can this be done without requiring much interac- 
tion between the client and the cloud? This is the problem of non-interactive 
verifiable computation, which was considered implicitly in the early work on ef- 
ficient arguments by Kilian and computationally sound proofs (CS proofs) 
by Micali [20], and which has been the subject of much attention lately 
sidusid. 

The starting point of this paper is that while the recent solutions consider 
and solve the bare-bones verifiable computation problem in its simplest form, 
there are a number of desirable features that they fail to achieve. We consider 
two such properties — namely, public delegatability and public verifiability. 


Public Delegatability. In a nutshell, public delegatability says that everyone 
should be able to delegate computations to the cloud. In some protocols (2) 
[4 mi u, a client who wishes to delegate computation of a function F is re- 
quired to first run an expensive pre-processing phase (wherein her computation 
is linear in the size of the circuit for F) to generate a (small) secret key SK p 
and a (large) evaluation key EKp. This large initial cost is then amortized over 
multiple executions of the protocol to compute F (x;) for different inputs z;, but 
the client needs the secret key SK p in order to initiate each such execution. In 
other words, clients can delegate computation to the cloud only if they put in a 
large initial computational investment. This makes sense only if the client wishes 
to run the same computation on many different inputs. Can clients delegate 
computation without making such a large initial commitment of resources? 

As an example of a scenario where this might come in handy, consider a clinic 
with a doctor and a number of lab assistants, which wishes to delegate the compu- 
tation of a certain expensive data analysis function F to a cloud service. Although 
the doctor determines the structure and specifics of F, it is in reality the lab as- 
sistants who come up with inputs to the function and perform the delegation. In 
this scenario, we would like to ask the doctor to run the (expensive) pre-processing 
phase once and for all, and generate a (small) public key PK r and an evaluation 
key EK. The public key lets anyone, including the lab assistants, delegate the 
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computation of F to the cloud and verify the results. Thus, once the doctor makes 
the initial investment, any of the lab assistants can delegate computations to the 
cloud without the slightest involvement of the doctor. Needless to say, the cloud 
should not be able to cheat even given PK pr and EK fp. 

Goldwasser, Kalai and Rothblum present a publicly delegatable verifiable 
computation protocol for functions in the complexity class NC (namely, func- 
tions that can be computed by circuits of size poly(n) and depth polylog(n)); 
indeed, their protocol is stronger in that it does not even require a pre-processing 
phase. In contrast, as mentioned above, many of the protocols for verifying gen- 
eral functions a, , (ad, are not publicly delegatable. In concurrent work, 
Canetti, Riva, and Rothblum propose a similar notion (though they call it “pub- 
lic verifiability” ) [d and construct a protocol, based on collision-resistant hashing 
and poly-logarithmic PIR, for general circuits C where the client runs in time 
poly(log(|C|),depth(C)); they do not achieve the public verifiability property 
we define below. Computationally sound (CS) proofs achieve public delegata- 
bility; however the known constructions of CS proofs are either in the random 
oracle model , or rely on non-standard “knowledge of exponent”-type as- 
sumptions (5, . Indeed, this seems to be an inherent limitation of solutions 
based on CS proofs since Gentry and Wichs showed recently that CS proofs 
cannot be based on any falsifiable cryptographic assumption (using a black-box 
security reduction). Here, we are interested in standard model constructions, 
based on standard (falsifiable) cryptographic assumptions. 


Public Verifiability. In a similar vein, the delegator should be able to produce a 
(public) “verification key” that enables anyone to check the cloud’s work. In the 
context of the example above, when the lab assistants delegate a computation on 
input x, they can also produce a verification key VK, that will let the patients, 
for example, obtain the answer from the cloud and check its correctness. Neither 
the lab assistants nor the doctor need to be involved in the verification process. 
Needless to say, the cloud cannot cheat even if it knows the verification key V Kz. 

Papamanthou, Tamassia, and Triandopoulos present a verifiable compu- 
tation protocol for set operations that allows anyone who receives the result of 
the set operation to verify its correctness. In concurrent work, Papamanthou, 
Shi, and Tamassia propose a similar notion, but they achieve it only for 
multivariate polynomial evaluation and differentiation, and the setup and eval- 
uation run in time exponential in the degree; they do not consider the notion 
of public delegation. Neither the Goldwasser-Kalai-Rothblum protocol nor 
any of the later works (2 M, id, seem to be publicly verifiable. 


Put together, we call a verifiable computation protocol that is both publicly 
delegatable and publicly verifiable a public verifiable computation protocol. We 
are not aware of any such protocol (for a general class of functions) that is 
non-interactive and secure in the standard model. Note that we still require the 
party who performs the initial function preprocessing (the doctor in the example 
above) to be trusted by those delegating inputs and verifying outputs. 

As a bonus, a public verifiable computation protocol is immune to the “rejec- 
tion problem” that affects several previous constructions a, id, u. Essentially, 
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the problem is that these protocols do not provide reusable soundness; i.e., a 
malicious cloud that is able to observe the result of the verification procedure 
(namely, the accept /reject decision) on polynomially many inputs can eventually 
break the soundness of the protocol. It is an easy observation that public veri- 
fiable computation protocols do not suffer from the rejection problem. Roughly 
speaking, verification in such protocols depends only on the public key and some 
(instance-specific) randomness generated by the delegator, and not on any long- 
term secret state. Thus, obtaining the result of the verification procedure on one 
instance does not help break the soundness on a different instance[} 


This paper is concerned with the design of public (non-interactive) verifiable 
computation protocols. 


1.1 Our Results and Techniques 


Our main result is a (somewhat surprising) connection between the notions of 
attribute-based encryption (ABE) and verifiable computation (VC). In a nut- 
shell, we show that a public verifiable computation protocol for a class of func- 
tions F can be constructed from any attribute-based encryption scheme for a 
related class of functions — namely, FUF. Recall that attribute-based encryption 
(ABE) [15, is a rich class of encryption schemes where secret keys ABE.SK p 
are associated with functions F, and can decrypt ciphertexts that encrypt a 
message m under an “attribute” x if and only if F(x) = 1. 

For simplicity, we state all our results for the case of Boolean functions, namely 
functions with one-bit output. For functions with many output bits, we simply 
run independent copies of the verifiable computation protocol for each output 
bit. 


Theorem 1 (Main Theorem, Informal). Let F be a class of Boolean func- 
tions, and let F = {F | F € F} where F denotes the complement of the function 
F. If there is a key-policy ABE scheme for FUF, then there is a public verifiable 
computation protocol for F. 


Some remarks about this theorem are in order. 

1. First, our construction is in the pre-processing model, where we aim to out- 
source the computation of the same function F on polynomially many inputs 
xi with the goal of achieving an amortized notion of efficiency. This is the 
same as the notion considered in id, aj, and different from the one in [13]. 
See Definition 

2. Secondly, since the motivation for verifiable computation is outsourcing com- 
putational effort, efficiency for the client is obviously a key concern. Our pro- 
tocol will be efficient for the client, as long as computing an ABE encryption 
(on input a message m and attribute x) takes less time than evaluating the 
function F on x. We will further address the efficiency issue in the context 
of concrete instantiations below (as well as in Section [3.2). 


1 In fact, this observation applies also to any protocol that is publicly delegatable and 
not necessarily publicly verifiable. 
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3. Third, we only need a weak form of security for attribute-based encryption 
which we will refer to as one-key security. Roughly speaking, this requires 
that an adversary, given a single key ABE.SKr for any function F of its 
choice, cannot break the semantic security of a ciphertext under any attribute 
x such that F(x) = 0. Much research effort on ABE has been dedicated to 
achieving the much stronger form of security against collusion, namely when 
the adversary obtains secret keys for not just one function, but polynomially 
many functions of its choice. We will not require the strength of these re- 
sults for our purposes. On the same note, constructing one-key secure ABE 
schemes is likely to be much easier than full-fledged ABE schemes. 


Note on Terminology: Attribute-based Encryption versus Predicate Encryption. 
In this paper, we consider attribute-based encryption (ABE) schemes to be ones 
in which each secret key ABE.SK p is associated with a function F, and can de- 
crypt ciphertexts that encrypt a message m under an “attribute” x if and only 
if F(x) = 1. This formulation is implicit in the early definitions of ABE intro- 
duced by Goyal, Pandey, Sahai and Waters IE! [25]. However, their work refers 
to F as an access structure, and existing ABE instantiations are restricted to 
functions (or access structures) that can be represented as polynomial-size span 
programs (a generalization of Boolean formulas) (15) ig, While such restric- 
tions are not inherent in the definition of ABE, the fully general formulation 
we use above was first explicitly introduced by Katz, Sahai, and Waters, who 
dubbed it predicate encryption id. Note that we do not require attribute-hiding 
or policy/function-hiding, properties often associated with predicate encryption 
schemes (there appears to be some confusion in the literature as to whether 
attribute-hiding is inherent in the definition of predicate encryption ig, iE [19], 
but the original formulation does not seem to require it). 

Thus, in a nutshell, our work can be seen as using ABE schemes for general 
functions, or equivalently, predicate encryption schemes that do not hide the 
attributes or policy, in order to construct verifiable computation protocols. 


Let us now describe an outline of our construction. The core idea of our con- 
struction is simple: attribute-based encryption schemes naturally provide a way 
to “prove” that F(x) = 1. Say the server is given the secret key ABE.SK p for 
a function F, and a ciphertext that encrypts a random message m under the 
attribute x. The server will succeed in decrypting the ciphertext and recovering 
m if and only if F(x) = 1. If F(x) = 0, he fares no better at finding the message 
than a random guess. The server can then prove that F(x) = 1 by returning the 
decrypted message. 

More precisely, this gives an effective way for the server to convince the client 
that F(x) = 1. The pre-processing phase for the function F generates a master 
public key ABE.MPK for the ABE scheme (which acts as the public key for the 
verifiable computation protocol) and the secret key ABE.SK p for the function F 
(which acts as the evaluation key for the verifiable computation protocol). Given 
the public key and an input z, the delegator encrypts a random message m under 
the attribute x and sends it to the server. If F(x) = 1, the server manages to 
decrypt and return m, but otherwise, he returns L. Now, 
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— If the client gets back the same message that she encrypted, she is convinced 
beyond doubt that F(x) = 1. This is because, if F(x) were 0, the server 
could not have found m (except with negligible probability, assuming the 
message is long enough). 

— However, if she receives no answer from the server, it could have been because 
F(x) = 0 and the server is truly unable to decrypt, or because F(x) = 1 but 
the server intentionally refuses to decrypt. 


Thus, we have a protocol with one-sided error — if F(x) = 0, the server can never 
cheat, but if F(x) = 1, he can. 

A verifiable computation protocol with no error can be obtained from this 
by two independent repetitions of the above protocol — once for the function F 
and once for its complement F. A verifiable computation protocol for functions 
with many output bits can be obtained by repeating the one-bit protocol above 
for each of the output bits. Intuitively, since the preprocessing phase does not 
create any secret state, the protocol provides public verifiable computation. Fur- 
thermore, the verifier performs as much computation as is required to compute 
two ABE encryptions. 

Perspective: Signatures on Computation. Just as digital signatures authenticate 
messages, the server’s proof in a non-interactive verifiable computation protocol 
can be viewed as a “signature on computation”, namely a way to authenti- 
cate that the computation was performed correctly. Moni Naor has observed 
that identity-based encryption schemes give us digital signature schemes, rather 
directly ih. Given our perspective, one way to view our result is as a logical 
extension of Naor’s observation to say that just as IBE schemes give us digital 
signatures, ABE schemes give us signatures on computation or, in other words, 
non-interactive verifiable computation schemes. 

Instantiations. Instantiating our protocol with existing ABE schemes creates 
challenges with regard to functionality, security, and efficiency. We discuss this 
issues briefly below and defer a detailed discussion to Section B.2] 

As mentioned earlier, existing ABE schemes only support span programs or 
polynomial-size Boolean formulas IE! [21], which restricts us to this class 
of functions as well. In particular, the more recent ABE schemes, such as that 
of Ostrovsky, Sahai, and Waters [21], support the class of all (not necessarily 
monotone) formulas. 

Another challenge is that most ABE schemes IE! eies are proven secure only 
in a selective-security model. As a result, instantiating the protocol above with 
such a scheme would inherit this limitation. If we instantiate our protocol with the 
scheme of Ostrovsky, Sahai, and Waters [1], we achieve a VC protocol for the class 
of polynomial-size Boolean formulas, which has delegation and verification algo- 
rithms whose combined complexity is more efficient than the function evaluation. 
Essentially, the complexity gain arises because the delegation algorithm is essen- 
tially running the ABE encryption algorithm whose complexity is a fixed polyno- 
mial in |x|, the size of the input to the function, as well as the security parameter. 
The verification algorithm is very simple, involving just a one-way function com- 
putation. The resulting verifiable computation protocol is selectively secure. 
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Unfortunately, removing the “selective restriction” seems to be a challenge 
with existing ABE schemes. Although there have recently been constructions of 
adaptively secure ABE schemes, starting from the work of Lewko et al. [19], all 
these schemes work for bounded polynomial-size Boolean formulas. The up-shot 
is that the amount of work required to generate an encryption is proportional to 
the size of the formula, which makes the delegation as expensive as the function 
evaluation (and thus, completely useless)! 

Much work in the ABE literature has been devoted to constructing ABE 
schemes that are secure against collusion. Namely, the requirement is that even 
if an adversary obtains secret keys for polynomially many functions, the scheme 
still retains security (in a precise sense). However, for our constructions, we re- 
quire much less from the ABE scheme! In particular, we only need the scheme 
to be secure against adversaries that obtain the secret key for a single func- 
tion. This points to instantiating our general construction with a one-key se- 
cure ABE scheme from the work of Sahai and Seyalioglu for the class of 
bounded polynomial-size circuits. Unfortunately, because their scheme only sup- 
ports bounded-size circuits, it suffers from the same limitation as that of Lewko 
et al. ig. However, we can still use their construction to obtain a VC protocol 
where the parallel complexity of the verifier is significantly less than that required 
to compute the function. 

We also note that when we instantiate our VC protocol with existing ABE 
schemes, the computation done by both the client and the worker is significantly 
cheaper than in any previous VC scheme, since we avoid the overhead of PCPs 
and FHE. However, existing ABE schemes restrict us to either formulas or a less 
attractive notion of parallel efficiency. It remains to be seen whether this effi- 
ciency can be retained while expanding the security offered and the class of func- 
tions supported. Fortunately, given the amount of interest in and effort devoted 
to new ABE schemes, we expect further improvements in both the efficiency and 
security of these schemes. Our result demonstrates that such improvements, as 
well as improvements in the classes of functions supported, will benefit verifiable 
computation as well. 


1.2 Other Results 

Multi-Function Verifiability and ABE with Outsourcing. The definition of veri- 
fiable computation focuses on the evaluation of a single function over multiple 
inputs. In many constructions A, d, the evaluated function is embedded in 
the parameters for the VC scheme that are used for the input processing for the 
computation. Thus evaluations of multiple functions on the same input would 
require repeated invocation for the ProbGen algorithm. A notable difference are 
approaches based on PCPs (5) ia, that may require a single offline stage for 
input processing and then allow multiple function evaluations. However, such 
approaches inherently require verification work proportional to the depth of the 
circuit, which is at least logarithmic in the size of the function and for some 
functions can be also proportional to the size of the circuit. Further these ap- 
proaches employ either fully homomorphic encryption or private information 
retrieval schemes to achieve their security properties. 
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Using the recently introduced definition of ABE with outsourcing [16], we 
achieve a multi-function verifiable computation scheme that decouples the eval- 
uated function from the parameters of the scheme necessary for the input prepa- 
ration. This VC scheme provides separate algorithms for input and function 
preparation, which subsequently can be combined for multiple evaluations. When 
instantiated with an existing ABE scheme with outsourcing [16], the verification 
algorithm for the scheme is very efficient: its complexity is linear in the output 
size but independent of the input length and the complexity of the computation. 
Multi-function VC provides significant efficiency improvements whenever mul- 
tiple functions are evaluated on the same input, since a traditional VC scheme 
would need to invoke ProbGen for every function. 


Attribute-Based Encryption from Verifiable Computation. We also consider the 
opposite direction of the ABE-VC relation: can we construct an ABE scheme 
from a VC scheme? We are able to show how to construct an ABE scheme from 
a very special class of VC schemes with a particular structure. Unfortunately, 
this does not seem to result in any new ABE constructions. 


Due to space constraints, we defer the details to the full version of this paper. 


2 Definitions 


2.1 Public Verifiable Computation 


We propose two new properties of verifiable computation schemes, namely 
— Public Delegation, which allows arbitrary parties to submit inputs for dele- 
gation, and 
— Public Verifiability, which allows arbitrary parties (and not just the delega- 
tor) to verify the correctness of the results returned by the worker. 
Together, a verifiable computation protocol that satisfies both properties is called 
a public verifiable computation protocol. The following definition captures these 
two properties. 


Definition 1 (Public Verifiable Computation). A public verifiable compu- 
tation scheme (with preprocessing) VC is a four-tuple of polynomial-time algo- 
rithms (KeyGen, ProbGen, Compute, Verify) which work as follows: 

— (PKpr, EKp) 4+ KeyGen(F,1*): The randomized key generation algorithm 
takes as input a security parameter A and the function F, and outputs a 
public key PK pr and an evaluation key EK rp. 

— (oz, VKz) < ProbGen(PK pr, x): The randomized problem generation algo- 
rithm uses the public key PK yr to encode an input x into public values oy 
and VK,. The value oy, is given to the worker to compute with, whereas 
VK, is made public, and later used for verification. 

— Cout < Compute(EKF, oz): The deterministic worker algorithm uses the 
evaluation key EK rp together with the value o, to compute a value Oout. 

— y + Verify(V Kx, Cout): The deterministic verification algorithm uses the ver- 
ification key VK, and the worker’s output dour to compute a string 
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y € {0,1}* U{L}. Here, the special symbol L signifies that the verification 
algorithm rejects the worker’s answer Oout. 


A number of remarks on the definition are in order. 

First, in some instantiations, the size of the public key (but not the evaluation 
key) will be independent of the function F, whereas in others, both the public 
key and the evaluation key will be as long as the description length of F. For 
full generality, we refrain from making the length of the public key a part of the 
syntactic requirement of a verifiable computation protocol, and instead rely on 
the definition of efficiency to enforce this (see Definition [4] below). 

Secondly, our definition can be viewed as a “public-key version” of the ear- 
lier VC definition (10, la. In the earlier definition, KeyGen produces a secret 
key that was used as an input to ProbGen and, in turn, ProbGen produces a 
secret verification value needed for Verify (neither of these can be shared with 
the worker without losing security). Indeed, the “secret-key” nature of these 
definitions means that the schemes could be attacked given just oracle access 
to the verification function (and indeed, there are concrete attacks of this na- 
ture against the schemes in a [10, mh). Our definition, in contrast, is stronger 
in that it allows any party holding the public key PK pr to delegate and verify 
computation of the function F on any input x, even if the party who originally 
ran ProbGen is no longer online. This, in turn, automatically protects against 
attacks that use the verification oracle. 


Definition 2 (Correctness). A verifiable computation protocol VC is correct 
for a class of functions F if for any F € F, any pair of keys (PKp,EKp) + 
KeyGen(F,1*), any x € Domain(F), any (or,VK,) < ProbGen(PKr,x), and 
any Cout <- Compute(E- Kp, ox), the verification algorithm Verify on input VK, 
and Cout outputs y = F(x). 


Providing public delegation and verification introduces a new threat model in 
which the worker knows both the public key PK p (which allows him to delegate 
computations) and the verification key VK, for the challenge input x (which 
allows him to check whether his answers will pass the verification). 


Definition 3 (Security). Let VC be a public verifiable computation scheme 
for a class of functions F, and let A = (Aj, A2) be any pair of probabilistic 
polynomial time machines. Consider the experiment hep Ve (VC, F, A] for 
any F € F below: 


Experiment Exp? (VC, F, A] 
(PKp, EKp) 4 KeyGen(F, 1); 
(a*, state) — Ai(PKr, EKp); 
(on*,V Ky,«) < ProbGen(PK p, x*); 
Onur < Ao(state, oz», VKz*); 


out 


y* < Verify(V Kr*, Oču) 


out 


If y* AL and y* # F(x*), output ‘1’, else output ‘0’; 


A public verifiable computation scheme VC is secure for a class of functions F, 
if for every function F € F and every p.p.t. adversary A = (Aj, Ag): 
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Pr|Exph V [vc, F, A] = 1] < negl(A). (1) 


where negl denotes a negligible function of its input. 


Later, we will also briefly consider a weaker notion of “selective security” which 
requires the adversary to declare the challenge input x* before it sees PK p. 
For verifiable outsourcing of a function to make sense, the client must use “less 
resources” than what is required to compute the function. “Resources” here could 
mean the running time, the randomness complexity, space, or the depth of the 
computation. We retain the earlier efficiency requirements — namely, we re- 
quire the complexity of ProbGen and Verify combined to be less than that of F. 
However, for KeyGen, we ask only that the complexity be poly(| F|). Thus, we em- 
ploy an amortized complexity model, in which the client invests a larger amount of 
computational work in an “offline” phase in order to obtain efficiency during the 
“online” phase. We provide two strong definitions of efficiency — one that talks 
about the running time and a second that talks about computation depth. 


Definition 4 (Efficiency). A verifiable computation protocol VC is efficient for 
a class of functions F that act on n = n(A) bits if there is a polynomial p s.t. :E 
— the running time of ProbGen and Verify together is at most p(n, A), the rest 
of the algorithms are probabilistic polynomial-time, and 
— there exists a function F € F whose running time is w(p(n, A)). 
In a similar vein, VC is depth-efficient if the computation depth of ProbGen and 
Verify combined (written as Boolean circuits) is at most p(n, A), whereas there is 
a function F € F whose computation depth is w(p(n, A)). 


We now define the notion of unbounded circuit families which will be helpful in 
quantifying the efficiency of our verifiable computation protocols. 


Definition 5. We define a family of circuits {Cn}nen to be unbounded if for 
every polynomial p and all but finitely many n, there is a circuit C € Cy of size 
at least p(n). We call the family depth-unbounded if for every polynomial p and 
all but finitely many n, there is a circuit C € Cn of depth at least p(n). 


2.2  Key-Policy Attribute-Based Encryption 


Introduced by Goyal, Pandey, Sahai and Waters 15], Key-Policy Attribute- 
Based Encryption (KP-ABE) is a special type of encryption scheme where a 
Boolean function F is associated with each user’s key, and a set of attributes 
(denoted as a string x € {0,1}”) with each ciphertext. A key Sp for a function 
F will decrypt a ciphertext corresponding to attributes x if and only if F(x) = 1. 


2 To be completely precise, one has to talk about a family F = {Fn}nen parameterized 
by the input length n. We simply speak of F to implicitly mean Fn whenever there 
is no cause for confusion. 

3 This condition is to rule out trivial protocols, e.g., for a class of functions that can 
be computed in time less than p(A). 
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KP-ABE can be thought of as a special-case of predicate encryption or func- 
tional encryption is], although we note that a KP-ABE ciphertext need not hide 
the associated policy or attributes. We will refer to KP-ABE simply _ as ABE 
from now on. We state the formal definition below, adapted from FENO 


Definition 6 (Attribute-Based Encryption). An attribute-based encryption 
scheme ABE for a class of functions F = {Fn}nen (where functions in Fn take n 
bits as input) is a tuple of algorithms (Setup, Enc, KeyGen, Dec) that work as fol- 
lows: 
— (PK, MSK) + Setup(1à,1”) : Given a security parameter X and an index 
n for the family Fn, output a public key PK and a master secret key MSK. 
— C+ Enc(PK, M, x): Given a public key PK, a message M in the message 
space MsgSp, and attributes x € {0,1}”", output a ciphertext C. 
— SKp + KeyGen(MSK, F): Given a function F and the master secret key 
MSK, output a decryption key SKp associated with F. 
— w+ Dec(SKp,C): Given a ciphertext C € Enc(PK,M,2x) and a secret key 
SKr for function F, output a message u E€ MsgSp or uw =L. 


Definition 7 (ABE Correctness). Correctness of the ABE scheme requires 
that for all (PK,MSK) + Setup(1,1"), all M € MsgSp, x € {0,1}”, all 
cipherterts C < Enc(PK, M, <x) and all secret keys SK pe + KeyGen(MSK, F), 
the decryption algorithm Dec(SKpr,C) outputs M if F(a) =1 and L if F(x) = 
0. (This definition could be relaxed to hold with high probability over the keys 
(PK,MSK), which suffices for our purposes). 


We define a natural, yet relaxed, notion of security for ABE schemes which we 
refer to as “one-key security”. Roughly speaking, we require that adversaries 
who obtain a single secret key SKp for any function F of their choice and 
a ciphertext C + Enc(PK,M,2) associated with any attributes x such that 
F(x) = 0 should not be able to violate the semantic security of C. We note 
that much work in the ABE literature has been devoted to achieving a strong 
form of security against collusion, where the adversary obtains not just a single 
secret key, but polynomially many of them for functions of its choice. We do not 
require such a strong notion for our purposes. 


Definition 8 (One-Key Security for ABE). Let ABE be a key-policy 
attribute-based encryption scheme for a class of functions F = {Fr}nen, and 
let A = (Ao, A1, A2) be a three-tuple of probabilistic polynomial-time machines. 
We define security via the following experiment. 


Experiment Exp4?” | ABE, n, A] 
(PK, MSK) + Setup(1>, 1”); 
(Ff, state;) — Ao(PK); 
SK pr + KeyGen(M SK, F); 
(Mo, M,,2*, state.) — A; (state, SKr); 
b + {0,1}; C + Enc(PK, Mp, 2*); 
b + Ao(statez, C'); 
If b= b, output ‘1’, else ‘0’; 
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The experiment is valid if Mo, Mı € MsgSp and |Mo| = |Mi|. We define the 
advantage of the adversary in all valid experiments as 


Adva (ABE, n, A) = |Pr[b = b'] — 1/2]. 
We say that ABE is a one-key secure ABE scheme if Adva (ABE, n, A) < negl(A). 


3 Verifiable Computation from ABE 


In Section [3-1] we present our main construction and proof, while Section B.2] 
contains the various instantiations of our main construction and the concrete 
verifiable computation protocols that we obtain as a result. 


3.1 Main Construction 


Theorem 2. Let F be a class of Boolean functions (implemented by a family 
of circuits C), and let F = {F | F € F} where F denotes the complement of the 
function F. Let ABE be an attribute-based encryption scheme that is one-key 
secure (see Definition [8) for F UF, and let g be any one-way function. 

Then, there is a verifiable computation protocol VC (secure under Definition[3} 
for F. If the circuit family C is unbounded (resp. depth-unbounded), then the 
protocol VC is efficient (resp. depth-efficient) in the sense of Definition H} 


We first present our verifiable computation protocol. 

Let ABE = (ABE.Setup, ABE.KeyGen, ABE.Enc, ABE.Dec) be an attribute- 
based encryption scheme for the class of functions F U F. Then, the verifiable 
computation protocol VC = (VC.KeyGen, ProbGen, Compute, Verify) for F works 
as follows We assume, without loss of generality, that the message space M of 
the ABE scheme has size 2%. 

Key Generation VC.KeyGen: The client, on input a function F € F with in- 
put length n, runs the ABE setup algorithm twice, to generate two indepen- 
dent key-pairs 


(msko, mpk,) +— ABE.Setup(1”,1*) and  (mski,mpk,) + ABE.Setup(1”, 1°) 
_Generate two secret keys skp + ABE.KeyGen(msko, F) (corresponding to 
F) and skp + ABE.KeyGen(mskj, F) (corresponding to F). 


Output the pair (skz,skr) as the evaluation key and (mpkg, mpk,) as the 
public key. 
Delegation ProbGen: The client, on input x and the public key PK, samples 


two uniformly random messages mo, Mı 2 M, computes the ciphertexts 
CTo + ABE.Enc(mpko, Mmo) and CT, + ABE.Enc(mpk,,™1) 


Output the message o, = (CTo,CT1) (to be sent to the server), and the 
verification key VK, = (g(mo), g(m1)), where g is the one-way function. 


4 We denote the VC key generation algorithm as VC.KeyGen in order to avoid confusion 
with the ABE key generation algorithm. 
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Computation Compute: The server, on receiving the ciphertexts (CTo, CT1) 
and the evaluation key EK = (skp, skr) computes 


uo + ABE.Dec(sky, CTo) and pı + ABE.Dec(skp, CT1) 


and send Gout = (Ho, H1) to the client. 
Verification Verify: On receiving V Kz = (vo, v1) and Gout = (Ho, H1), output [| 


0 if g(uo) = vo and g(p1) # v1 
y= 1 if g(pı) = vı and g(u0) F vo 
L otherwise 


Remark 1. Whereas our main construction requires only an ABE scheme, using 
an attribute-hiding ABE scheme (a notion often associated with predicate en- 
cryption schemes ig a) would also give us input privacy, since we encode the 
function’s input in the attribute corresponding to a ciphertext. 


Remark 2. To obtain a VC protocol for functions with multi-bit output, we re- 
peat this protocol (including the key generation algorithm) independently for 
every output bit. To achieve better efficiency, if the ABE scheme supports at- 
tribute hiding for a class of functions that includes message authentication codes 
(MAC), then we can define F'(x) = MACg(F(x)) and verify F’ instead, simi- 
lar to the constructions suggested by Applebaum, Ishai, and Kushilevitz (2). and 
Barbosa and Farshim 


Remark 3. The construction above requires the verifier to trust the party that 
ran ProbGen. This can be remedied by having ProbGen produce a non-interactive 
zero-knowledge proof of correctness id of the verification key V Kz. While the- 
oretically efficient, the practicality of this approach depends on the particular 
ABE scheme and the NP language in question. 


Proof of Correctness: The correctness of the VC scheme above follows from: 

— If F(x) = 0, then F(x) = 1 and thus, the algorithm Compute outputs uo = 

mo and uı =L. The algorithm Verify outputs y = 0 since g(uo) = g(mo) but 
g(441) =LA g(mı), as expected. 

— Similarly, if F(x) = 1, then F(x) = 0 and thus, the algorithm Compute 
outputs “1 = mı and uo =L. The algorithm Verify outputs y = 1 since 
g(u1) = g(m1) but g(uo) =LA g(mo), as expected. 

E 
We now consider the relation between the efficiency of the algorithms for the 
underlying ABE scheme and the efficiency for the resulting VC scheme. Since the 
algorithms Compute and Verify can potentially be executed by different parties, 
we consider their efficiency separately. It is easily seen that: 


— The running time of the VC key generation algorithm VC.KeyGen is twice 
that of ABE.Setup plus ABE.KeyGen. 


5 As a convention, we assume that g(L) =L. 
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— The running time of Compute is twice that of ABE.Dec. 


— The running time of ProbGen is twice that of ABE.Enc, and the running time 
of Verify is the same as that of computing the one-way function. 


In short, the combined running times of ProbGen and Verify is polynomial in their 
input lengths, namely p(n, A), where p is a fixed polynomial, n is the length of 
the input to the functions, and A is the security parameter. Assuming that F is 
an unbounded class of functions (according to Definition[5), it contains functions 
that take longer than p(n, A) to compute, and thus our VC scheme is efficient in 
the sense of Definition [4] (Similar considerations apply to depth-efficiency). 


We now turn to showing the security of the VC scheme under Definition B] We 
show that an attacker against the VC protocol must either break the security of 
the one-way function g or the one-key security of the ABE scheme. 


Proof of Security: Let A = (Aj, A2) be an adversary against the VC scheme 
for a function F € F. We construct an adversary B = (Bo, Bi, B2) that breaks 
the one-key security of the ABE, working as follows. (For notational simplicity, 
given a function F, we let Fo = F, and F; = F.) 


1. Bo first tosses a coin to obtain a bit b € {0,1}. (Informally, the bit b corre- 
sponds to B’s guess of whether the adversary A will cheat by producing an 
input x such that F(x) = 1 or F(x) = 0, respectively.) 

Bo outputs the function Fy, as well as the bit b as part of the state. 

2. Bı obtains the master public key mpk of the ABE scheme and the secret key 
skp, for the function Fp. Set mpk, = mpk. 

Run the ABE setup and key generation algorithms to generate a master 
public key mpk’ and a secret key skp,_, for the function Fi- under mpk’. 
Set mpk,_, = mpk’. 

Let (mpkg, mpk,) be the public key for the VC scheme and (skp, skp, ) be 
the evaluation key. Run the algorithm A; on input the public and evaluation 
keys and obtain a challenge input x* as a result. 

If F(a*) = b, output a uniformly random bit and stop. Otherwise, Bı 
now chooses two uniformly random messages M®), p + M and outputs 
(M), p,2*) together with its internal state. 

3. B2 obtains a ciphertext C(®) (which is an encryption of either M®) or p 
under the public key mpk, and attribute x*). 

By» constructs an encryption C(—®) of a uniformly random message M 
under the public key mpk,_, and attribute 2”*. 

Run Ag on input oz» = (CO ,C) and V Kz» = (g(M), g(M™), where 
g is the one-way function. As a result, Ag returns Gout- 

If Verify(V Kz», Cout) = b, output 0 and stop. 


(1-6) 


We now claim the algorithms (Bo, Bi, B2) described above distinguish between 
the encryption of M®) and the encryption of p in the ABE security game with 
non-negligible advantage. 
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We consider two cases. 


Case 1: C) is an encryption of M). In this case, B presents to A a perfect 
view of the execution of the VC protocol, meaning that A will cheat with 
probability 1/p(A) for some polynomial p. 

Cheating means one of two things. Either F'(a*) = b and the adversary 
produced an inverse of g(M(~?)) (causing the Verify algorithm to output 
1— b), or F(x*) = 1 — b and the adversary produced an inverse of g(M)) 
(causing the Verify algorithm to output b). 

In the former case, B outputs a uniformly random bit, and in the latter 
case, it outputs 0, the correct guess as to which message was encrypted. 
Thus, the overall probability that B outputs 0 is 1/2 + 1/p(A). 

Case 2: C) is an encryption of the message p. In this case, as above, B out- 
puts a random bit if F(x*) = b. Otherwise, the adversary A has to pro- 
duce Cout that makes the verifier output b, namely a string Cout such that 
g(Four) = g(M)), while given only g(M)) (and some other information 
that is independent of M(), 

This amounts to inverting the one-way function which A can only do with 
a negligible probability. (Formally, if the adversary wins in this game with 
non-negligible probability, then we can construct an inverter for the one-way 
function g). 

The bottom line is that the adversary outputs 0 in this case with proba- 
bility 1/2 + negl(A). 


This shows that B breaks the one-key security of the ABE scheme with a non- 
negligible advantage 1/p(A) — neg1 (A). a 


Remark 4. If we employ an ABE scheme that is selectively secure, then the 
construction and proof above still go through if we adopt a notion of “selectively- 
secure” verifiable computation in which the VC adversary commits in advance 
to the input on which he plans to cheat. 


3.2 Instantiations 


We describe two different instantiations of our main construction. 


Efficient Selectively Secure VC Scheme for Formulas. The first instantiation uses 
the (selectively secure) ABE scheme of Ostrovsky, Sahai and Waters for the 
class of (not necessarily monotone) polynomial-size Boolean formulas (which 
itself is an adaptation of the scheme of Goyal et al. which only supports 
monotone formulag9). This results in a selectively secure public VC scheme for 
the same class of functions, by invoking Theorem[] Recall that selective security 


6 Goyal et al.’s scheme can also be made to work if we use DeMorgan’s law to 
transform f and f into equivalent monotone formulas in which some variables may 
be negated. We then double the number of variables, so that for each variable v, we 
have one variable representing v and one representing its negation U. Given an input 
x, we choose an attribute such that all of these variables are set correctly. 
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in the context of verifiable computation means that the adversary has to declare 
the input on which she cheats at the outset, before she sees the public key and 
the evaluation key. 

The efficiency of the resulting VC scheme for Boolean formulas is as follows: 
for a boolean formula C, KeyGen runs in time |C|- poly(A); ProbGen runs in time 
|x| - poly(A), where || is the length of the input to the formula; Compute runs in 
time |C|- poly(A); and Verify runs in time O(A). In other words, the total work 
for delegation and verification is |x|- poly(A) which is, in general, more efficient 
than the work required to evaluate the circuit C. Thus, the scheme is efficient 
in the sense of Definition [4] The drawback of this instantiation is that it is only 
selectively secure. 

Recently, there have been constructions of fully secure ABE for formulas start- 
ing from the work of Lewko et al. [19] which, one might hope, leads to a fully 
secure VC scheme. Unfortunately, all known constructions of fully secure ABE 
work for bounded classes of functions. For example, in the construction of Lewko 
et al., once a bound B is fixed, one can design the parameters of the scheme 
so that it works for any formula of size at most B. Furthermore, implicit in 
the work of Sahai and Seyalioglu is a construction of an (attribute-hiding, 
one-key secure) ABE scheme for bounded polynomial-size circuits (as opposed to 
formulas). 

These constructions, unfortunately, do not give us efficient VC protocols. The 
reason is simply this: the encryption algorithm in these schemes run in time 
polynomial (certainly, at least linear) in B. Translated to a VC protocol using 
Theorem [2] this results in the worker running for time (B) which is useless, 
since given that much time, he could have computed any circuit of size at most 
B by himself! 

Essentially, the VC protocol that emerges from Theorem [2] is non-trivial if 
the encryption algorithm of the ABE scheme for the function family F is (in 
general) more efficient than computing functions in F. 


Depth-Efficient Adaptively Secure VC Scheme for Arbitrary Functions. Although 
the (attribute-hiding, one-key secure) ABE construction of Sahai and Seyali- 
oglu mentioned above does not give us an efficient VC scheme, it does result 
in a depth-efficient VC scheme for the class of polynomial-size circuits. Roughly 
speaking, the construction is based on Yao’s Garbled Circuits, and involves an 
ABE encryption algorithm that constructs a garbled circuit for the function F 
in question. Even though this computation takes at least as much time as com- 
puting the circuit for F, the key observation is that it can be done in parallel. 
In short, going through the VC construction in Theorem 2] one can see that 
both the Compute and Verify algorithms can be implemented in constant depth 
(for appropriate encryption schemes and one-way functions, e.g., the ones that 
result from the AIK transformation (il), which is much faster in parallel than 
computing F’, in general. 

Interestingly, the VC protocol thus derived is very similar to the protocol of 
Applebaum, Ishai and Kushilevitz id. We refer the reader to (2) for details. 
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We believe that this scheme also illuminates an interesting point: unlike other 
ABE schemes IE! ug ey, this ABE scheme is only one-key secure, which suffices 
for verifiable computation. This relaxation may point the way towards an ABE- 
based VC construction that achieves generality, efficiency, and adaptive security. 


4 Conclusions and Future Work 


In this work, we introduced new notions for verifiable computation: public dele- 
gatability and public verifiability. We demonstrated a somewhat surprising con- 
struction of a public verifiable computation protocol from any (one-key secure) 
attribute-based encryption (ABE) scheme. 

Our work leaves open several interesting problems. Perhaps the main open 
question is the design of one-key secure ABE schemes for general, unbounded 
classes of functions. Is it possible to come up with such a scheme for the class 
of all polynomial-size circuits (as opposed to circuits with an a-priori bound on 
the size, as in [24)? Given the enormous research effort in the ABE literature 
devoted to achieving the strong notion of security against collusion, our work 
points out that achieving even security against the compromise of a single key 
is a rather interesting question to investigate! 
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Abstract. We prove that there is no black-box construction of a thresh- 
old predicate encryption system from identity-based encryption. Our re- 
sult signifies nontrivial progress in a line of research suggested by Boneh, 
Sahai and Waters (TCC 711), where they proposed a study of the relative 
power of predicate encryption for different functionalities. We rely on and 
extend the techniques of Boneh et al. (FOCS ’08), where they give a black- 
box separation of identity-based encryption from trapdoor permutations. 

In contrast to previous results where only trapdoor permutations were 
used, our starting point is a more powerful primitive, namely identity- 
based encryption, which allows planting exponentially many trapdoors in 
the public-key by only planting a single master public-key of an identity- 
based encryption system. This makes the combinatorial aspect of our 
black-box separation result much more challenging. Our work gives the 
first impossibility result on black-box constructions of any cryptographic 
primitive from identity-based encryption. 

We also study the more general question of constructing predicate 
encryption for a complexity class F, given predicate encryption for a 
(potentially less powerful) complexity class G. Toward that end, we rule 
out certain natural black-box constructions of predicate encryption for 
NC! from predicate encryption for AC? assuming a widely believed 
conjecture in communication complexity. 


Keywords: Predicate Encryption, Black-Box Reductions, Identity- 
based Encryption, Communication Complexity. 


1 Introduction 


An encryption scheme enables a user to securely share data with other users. Tra- 
ditional methods based on Secret-Key Cryptography and Public-Key Cryptog- 
raphy consider the scenarios where a user securely shares data with another fired 
user whose identity (characterized by the possession of the decryption-key) it 
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knows in advance. In particular, in these schemes, there is a bijection between the 
encryption-key and the decryption-key, fixed by the chosen encryption scheme. 

As systems and networks grow in complexity, and in particular with the emer- 
gence of the cloud computing, the above viewpoint may be too narrow to cover 
many important applications. Often, a user might want to encrypt data to be 
shared with a large set of other users based on some common “property”, or at- 
tribute, they satisfy. Membership in this set may not be known to the encryp- 
tor, or may not even be decidable in advance. Furthermore, a user might want to 
share data selectively so different users are able to decrypt different parts of that 
data. To cater to these scenarios, the notion of Predicate Encryption (or Attribute- 
based Encryption) has recently emerged. Predicate encryption was introduced by 
Sahai and Waters [31I], and further developed in the work of Goyal et al. [I7]. It 
has been the subject of several recent works, e.g., [L1J19/24/28]10]. Predicate en- 
cryption is useful in a wide variety of applications; in particular, for fine-grained 
access control. It has also been a useful technical tool in solving seemingly un- 
related problems, e.g., key escrow[15] and user revocation [5] in Identity-based 
Encryption (IBE). IBE [32J8[12] can be seen as the most basic form of a predicate 
encryption, where the predicate corresponds to a point function. 

A predicate encryption scheme is defined in terms of a family F of Boolean 
functions (predicates) on a universe A of attributes. Decryption-keys are associ- 
ated to a predicate f € F and ciphertexts are labeled with (or are created based 
on) an attribute string a € A. A user with a decryption-key corresponding to 
f can decrypt a ciphertext labeled with x if and only if f(x) = 1. As argued 
by Boneh et al. [10], the key challenge in the study of predicate encryption (or 
Functional Encryption in general) is understanding what classes of functionali- 
ties F can be supported. If we could support any polynomial time computable 
predicate f, then any polynomial-time access control program that acts over a 
user’s credentials could be supported [10]. 

Unfortunately, the current state of the art is far from being able to support an ar- 
bitrary polynomial-time f. Given this, an important direction Boneh et al. sug- 
gested was to understand the relative strengths of predicate encryption schemes 
with respect to the functionalities they can support: When does a scheme for one 
functionality imply a scheme for another? In the absence of such a reduction, can 
we prove that predicate encryption for one functionality is inherently harder than 
for another? A meaningful approach to address this latter question is via black-box 
separations [18]; see [30]27] for a comprehensive survey on the topic. A proof that 
a cryptographic primitive P} cannot be constructed given black-box access to an- 
other primitive P> (and of course without incurring any additional assumptions) 
can be viewed as an indication that P, is in some sense a stronger primitive than 
P,. Hence, to construct P) one may have to look for more powerful techniques, or 
stronger assumptions than for P> (or try non-black-box reductions). Thus, study- 
ing these questions would help us better understand the extent to which techniques 
for current predicate encryption systems might or might not be useful in obtain- 
ing systems for more general functionalities. The broad goal of this work is to make 
progress toward answering these questions. 
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Since a predicate encryption scheme has an associated family F of Boolean 
functions, a natural way to classify such schemes is according to the complexity 
class the corresponding family comes from. For example, we can call a scheme 
(A, F) an AC°-PE scheme, if every member of F can be computed by a constant- 
depth polynomial size circuit (an AC® circuit) on an attribute string from A. 
Hence, a concrete approach to compare predicate encryption schemes is to ask 
questions of the kind: Given a predicate encryption scheme for predicates in 
complexity class G, can we construct a scheme for predicates in a (potentially 
larger) complexity class F in a black-box way? For example, it is well-known that 
the circuit class NC? is strictly larger than AC°. Thus a concrete question is: Is 
NC! -predicate encryption provably harder than AC? -predicate encryption with 
respect to black-box reductions? A second aspect of our work is to try to relate 
(perhaps conjectured) separations among Boolean function complexity classes to 
black-box separations among the corresponding predicate encryption schemes. 


1.1 Our Results 


Our main result is a black-box separation of threshold predicate encryption 
(TPE) from identity-based encryption (IBE) schemes. To our knowledge, this 
is the first result on the impossibility of constructing a cryptographic primitive 
from IBE in a blackbox manner. Recall that IBE can be viewed as the most 
basic form of predicate encryption in which the decryption tests exact equality 
(in other words, the predicate is a point function). Hence, the first natural step 
in the study of the above question is whether IBE can be used to construct 
more general predicate encryption systems. Our results show that IBE cannot 
be used to construct even a basic system for threshold predicates (introduced by 
Sahai and Waters [31]). We believe that the question of IBE vs. more advanced 
predicate encryption systems is of special interest. IBE as a primitive is very 
well studied |8[12[7J6[34]14], and constructions of IBE are now known based on 
a variety of hardness assumptions. 

Returning to our more general question, we rule out certain “natural” black- 
box constructions of predicate encryption for the class NC! from predicate en- 
cryption for the class AC®°, assuming a widely believed conjecture in the area 
of two-party communication complexity. Given black-box access to a predicate 
encryption scheme for (B, G), a natural way to construct a predicate encryption 
scheme for a “larger” system (A, F) is to use a a Sharing-Based Construction as 
follows. The decryption-key for an f € F is simply the set of decryption keys for 
aset S(f) = {91,---,9q} of predicates g; E€ G from the smaller system. Similarly, 
for each attribute a € A, we associate a set S(a) = {a1,...,aq} of attributes 
from B. To encrypt a message m under an attribute a for the big system, we 
generate q shares m,,...,™Mq of m and encrypt mj under the attribute a; of 
the small system. The concatenation of these encrypted shares is the ciphertext 
of m under a. To decrypt, we try to decrypt each mj using the decryption keys 
of each g; E€ S(f). The sharing construction ensures that the shares m; that 
are successfully decrypted, if any, in this process suffice to recover m. Thus the 
sharing-based construction is a rather natural and obvious way to build pred- 
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icate encryption schemes for more complex functionalities from simpler ones. 
Our result shows that such a sharing-based construction is impossible if F is a 
family of functions in NC! and G is any family of functions from AC’, assuming 
certain conjectures in communication complexity. It is worth noting that com- 
binatorial arguments about sharing-based constructions form a core component 
of our main result on (unrestricted) black-box separation of TPE from IBE. 


1.2 Techniques 


We build upon and extend the techniques of Boneh et al. [9] (and a follow-up 
work by Katz and Yerukhimovich [20]) which rule out black-box construction 
of IBE from Trapdoor Permutations (TDP). Along the way, we also simplify 
several aspects of their proof. Given a black-box construction of TPE from IBE, 
our proof proceeds by designing an attack on TPE which succeeds with high 
probability (in fact arbitrarily close to the completeness probability of the pur- 
ported TPE scheme). Somewhat more formally, we build an oracle O relative 
to which a CCA secure IBE exists, but any purported construction of a TPE 
relative to this oracle is insecure. 

Our analysis of the attack roughly consists of a combinatorial part and a cryp- 
tographic part. The combinatorial aspect of our analysis is new and completely 
different from that in [9]. While the cryptographic part is similar in structure 
to that of [9], we do make several crucial modifications that makes our attack 
simpler and analysis cleaner. 


A Comparison of the Combinatorial Aspects. At the heart of the proof of [9] 
is a combinatorial argument as follows. An IBE system obtained by a black- 
box construction from a TDP must embed in its public parameters the public 
keys of some permutations of the TDP oracle. The adversary’s main goal is to 
collect all the trapdoors corresponding to these permutations. Such trapdoors are 
embedded in the decryption keys corresponding to identities in the IBE system. 
The main point is that there are only q = poly (x) many permutations planted 
in the public parameters of the IBE, but they must also encode an exponential 
number of identities. Therefore, if we look at a sufficiently large set of random 
identities and their secret keys, and encrypt and decrypt a random message 
under these identities, during at most q of these decryptions we might encounter a 
“new” trapdoor (which is planted in the public-key to be used during encryption, 
but was not discovered during other decryptions). It follows, if we choose our 
identity set S to be of size k-q (and encrypt and decrypt random messages under 
them), and then choose an identity id < S at random from those q- k identities, 
then with probability at least 1 — 1/k there is no new (undiscovered) trapdoor 
left for this identity id. Therefore, whatever is learned during the decryptions of 
the encryptions of random messages under the identities S \ {id}, is sufficient to 
decrypt a message encrypted under id without knowing its decryption-key. 
This combinatorial argument immediately suggest the following attack. Get 
decryption-keys for all but a random identity id, chosen from a large enough 
random set S = idj,...,tdx.q of identities. Collect the trapdoors learned from 
the encryptions of random messages under the identities in S \ idx, and their de- 
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cryptions using the corresponding decryption-keys. Try to decrypt the challenge 
ciphertext C encrypted under the identity idx. 

In our case, we have a related but more difficult question: what if we start 
with a more powerful primitive like an IBE and want to construct another “tar- 
get” predicate encryption scheme? Now the intuition behind the combinatorial 
argument of [9] completely breaks down. The reason is that in our new setting, 
by planting only one (master) public-key of the IBE scheme in the public-key of 
the target predicate encryption, the encryption algorithm potentially has access 
to an exponential number of permutations (each indexed by an identity) whose 
trapdoors can be planted in the decryption-keys. In fact, each decryption-key of 
the predicate encryption system might have a unique trapdoor (corresponding 
to a unique identity derived from the description of the predicate). Hence, one 
can’t hope to learn all trapdoors and use them to decrypt the challenge cipher- 
text. Thus, roughly speaking, by moving from a trapdoor permutation oracle 
to various forms of PE oracles such as IBE (as the primitive used in the con- 
struction), we are allowing the “universe” of trapdoor permutations planted in 
the public-key and decryption-keys to be exponentially large (rather than some 
fixed polynomial). The latter difference is the main reason behind the complica- 
tions in the combinatorial aspect of our problem, because suddenly the regime of 
positive results becomes much richer, making the job of proving an impossibility 
result much more challenging. 

Our proof relies on the collusion-resistance property of the predicate encryp- 
tion. The “hope” that an attack exists comes from the following observations: 


— The decryption key for each predicate may still consist of only a polynomial 
number of IBE decryption-keys. 

— Each ciphertext is encrypted using a polynomially large set of identities such 
that a decryption-key for at least one of these identities is required to decrypt 
the ciphertext. On the other hand, each ciphertext can be decrypted by 
keys for an exponential number of different predicates (this follows from the 
property of a threshold encryption scheme). Call such predicates “related”. 

— This exponentially large set of related predicates must share an IBE 
decryption-key since they can decrypt a common ciphertext. 


Our attack works by requesting sufficient number of decryption-keys for related 
predicates (which would still be unable to decrypt challenge ciphertext). Since 
related predicates share IBE decryption-keys, the adversary is able to collect all 
“useful” IBE decryption-keys. It is not surprising that the above combinatorial 
arguments sound as though they could already be used to attack sharing based 
constructions. Indeed, our core combinatorial lemma (Lemma [LQ) is used to 
refute any sharing-based construction of a TPE from an IBE (Corollary (1). 


A Comparison of the Cryptographic Aspects. As in [9], turning the combinato- 
rial analysis into a full-fledged impossibility result requires non-trivial black-box 
separation machinery. For this reason, even though the combinatorial argument 
of [9] is relatively simple, the full proof is quite complicated. The explanation 
for the complexity of such proofs is that one has to handle all possible construc- 
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tions using a trapdoor permutation oracle (and not just where, for example, a 
decryption-key simply consists of decryption keys for various identities). 

Although the overall structure of our proof is similar to that of [9], there 
are several differences in the detailed arguments. In fact, we make some crucial 
modifications which lead to a more direct attack and cleaner analysis. The first 
major modification is that our attacker “directly” learns the heavy queries (fol- 
lowing the paradigm of [2]3]). In [9], the attack proceeds by having steps (such 
as several encryptions of a random bit under the challenge identity, repeating 
a few steps several times) whose indirect purpose is to learn the heavy queries. 
Secondly, since we start with an oracle which roughly provides four functional- 
ities (as opposed to the three functionalities of a trapdoor permutation oracle), 
we need to modify and adapt the techniques of [9] to the new setting. Apart 
from these, there are significant differences in the manner we compare the vari- 
ous experiments which we believe makes the analysis cleaner and more general. 
The details regarding these can be found in Section B]and in the full version [16] 
where we have deferred most of the proofs due to space constraints. 


2 Preliminaries 


Notation. For any probabilistic algorithm A, by y + A(x) we denote the process 
of executing A over the input x while using fresh randomness (which we do not 
represent explicitly) and getting the output y. By a partial oracle we refer to 
an oracle which is defined only for some of the queries it might be asked. By 
[cz + y] € P we mean that P(x) = y is defined. For a query x and a partial 
oracle P, we misuse the notation and denote x € P whenever an answer for x 
is defined in P. By Supp(X) we refer to the support set of the random variable 
X. For a random variable S whose values are sets, we call an element ¢-heavy, 
if Prix € S] > e. The view of any probabilistic oracle algorithm A, denoted as 
View(A) refers to its input, private randomness, and oracle answers (which all 
together determine the whole execution of A). 


Definition 1 (Predicate Encryption). A predicate encryption scheme PE 
for the predicate set F,, and attribute set Ax with completeness p consists of four 
probabilistic polynomial time algorithms PE = (G,K,E,D) such that for every 
predicate f € F, every attribute a E€ A such that f(a) = 1, and every message 
M, if we do the following steps, then with probability at least p it holds that 
M’ = M: (i). generate a public-key and a master secret-key (PK,SK) + G(1*), 
(ii). get a decryption-key DK «+ K(SK, f) for the predicate f € F, (iii). encrypt 
the message M under the attribute a € A and get C ~ E(PK,a, M), and finally, 
(iv). decrypt C using the decryption-key DKs and get M’ < D(PK, DKy,C). 
Definition 2 (Neighbor Sets of Predicates and Attributes). For every 
set of predicates F and f € F, and for every set of attributes A anda € A we 
define the following terminology: 

— N(f)={a|aeéA, f(a) = 1} and similarly N(a) = {f | f EF, f(a) = 1}. 

— deg(f) = |N(f)| and deg(a) = |N(a)]. 
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Since we always work with families of algorithms and sets indexed by a security 
parameter Kk, when it is clear from the context we might omit the index k. 


Definition 3 (Security of Predicate Encryption). Let PE = (G, K, E, D) 

be a predicate encryption scheme with the predicate set F and the attribute set 

A. PE is said to be CPA secure if for any PPT adversary Adv participating in 

the experiment below, the probability of Adv correctly outputting the bit b is at 

most 1/2 + neg(k): 

1. Setup: Generate the keys (PK,SK) + G(1") and give PK to Adv. 

2. Query Keys: Adv adaptively queries some predicates f; € F fori =1,2,... 
and is given the corresponding decryption-keys DK; < K(SK, fi). 

3. Challenge: Adv submits an attribute a E€ A and a pair of messages Mo 4 
M, of the same length |Mo| = |M,| conditioned on 


fila) =0 for every predicate fi whose key DK; is acquired by Adv (1) 


and is given C + E(PK,a, Mp) for a randomly selected b << {0,1}. 
4. Adv continues to query keys for predicates subject to condition O} and 
finally outputs a bit. 


PE is said to be CCA secure if for any PPT adversary Adv participating in a 
modified experiment (explained next), the probability of Adv correctly outputting 
the bitb is at most 1/2+neg(K). The modified experiment proceeds identically as 
the above experiment, except that after Step 3, Adv is also allowed to adaptively 
query ciphertexts Ci for i = 1,2,... encrypted under the attribute a, with the 
condition that C; # C for any i, and he is given the decrypted message M + 
D(DK;,C;), where DKs < K(SK, f) is a decryption-key for a predicate f such 
that f(a) =1. 

Definition 4 (Identity-based Encryption [32]). An Identity Based Encryp- 
tion scheme is a predicate encryption scheme where (1) the predicate and at- 
tribute sets are equal A = F = {0,1}" (and are called the set of identities), and 
(2) for every predicate f € {0,1}" and every attribute a € {0,1}* we have that 
f(a) =1 if and only if f =a. 

Definition 5 (Threshold Predicate Encryption [31]). A Threshold Pred- 
icate Encryption with threshold 0 < T < 1 (or simply a T-TPE) is a predicate 
encryption where both the predicate and the attribute sets are equal to {0,1}" and 
for any predicate f € {0,1}" and any attribute a € {0,1}" we have that f(a) = 1 
if and only if (f,a) > 7-« where (f,a) is the inner product of the Boolean vectors 


f=(hi,---, fkha = (Q1,---,@x) defined as (f,a) = iej] aj: fi. 


The notion of threshold predicate encryption was defined in and is also 
known as the fuzzy IBE. 


3 Sharing-Based Constructions and Impossibility Results 


In this section, we describe two intuitive and simple approaches to build a 
predicated encryption scheme using another predicate encryption scheme as a 
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black-box. It is interesting that the simpler of the two, the OR-based approach 
turns out to be as powerful as the seemingly more general Sharing-based ap- 
proach. Even though ruling out constructions using these approaches is a weaker 
impossibility result than an unrestricted black-box separation (as we will do in 
Section B), it seems instructive to refute these natural and general approaches 
to black-box reductions among predicate encryption schemes. In fact, our proof 
refuting OR-based constructions of TPE from this section forms the combinato- 
rial core of our subsequent proof of a general black-box separation in Section 5} 
Moreover, the basic approach to building the attack needed in our proof (as well 
as that in [9]) of the general black-box separation results seems to benefit by 
keeping the sharing-based constructions in mind. In Section [4] we investigate 
a new approach to refute sharing-based constructions using (proved or conjec- 
tured) separation results in two-party communication complexity. In particular, 
we can use conjectures in communication complexity to give evidence that NC!- 
predicate encryption is strictly harder than AC°-predicate encryption. 


Definition 6. Let (F,A) and (G,B) be two pairs of predicate and attribute sets. 
We call S(-) a q-set system for (F, A) using (G,B) if S is a mapping defined over 
FUA such that: (1) For every f € F it holds that S(f) C G, and for everya € A 
it holds that S(a) C B, and (2) For every x E€ FUA it holds that |S(a)| < q. 


Definition 7 (OR-based Construction). We say there is an OR-based con- 
struction with set-size q for the pair of predicate and attribute sets (F = {fi,...}, 
A = {aj,...}) using another pair (G = {y1,...$,B = {a1,...}) if there ez- 
ists a q-set system S(-) for (F,A) using (G,B) such that: For every f € F 
anda € A, if S(f) = {1,.--,Ga,} and S(a) = {au,...,aa,}, then f(a) = 
Vielas] jelda] pilaj). We call the OR-based construction efficient if the mapping 


S(-) is efficiently computable. 


The encryption under attribute a of an OR-based construction works by en- 
crypting a message M independently under every a; E S(a) and concatenating 
the corresponding ciphertexts. The decryption key for a predicate f is simply 
the set of keys DK; for all j € [dy], where DK; is the decryption key for y;. 


Lemma 8. Suppose there exists an efficient OR-based construction for (F, A) 
using (G, B). Then a secure predicate encryption scheme PE; = (G1, Ki, E1, D1) 
for (F,A) with completeness p can be constructed (in a black-box way) from any 
secure predicate encryption scheme PE2 = (G2, K2, E2, D2) for (G,B) with 
completeness p. 


Clearly, the OR-based construction of Lemma §Jis not the only way that one can 
imagine to construct an F-PE from a G-PE. In fact, as noted also by [20] in 
the context of using trapdoor permutations, there is a possibility of employing 
a more complicated “sharing-based” approach that generalizes the OR-based 
construction. The idea is to use a set system $(-) in a similar way to the OR-based 
construction, but to encrypt the message M differently: instead of encrypting 
the message M da times, first construct some “shares” Mı, ..., Ma, of M, and 
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then encrypt each M; using a;. To get the completeness and the security, we 
need the following two properties. 


— Completeness: For every f € F such that f(a) = 1, the set of indices 
Is(a, f) = {j | 3p € S(f) such that y(a;) = 1} is rich enough that {M; | 
i € Is(a, f)} can be used to reconstruct M. 

— Security: For every choice of ax, fx, fi,...,f~ for k = poly(«) such that 
f(a) = 1 and fi(a.) = 0 for all i € [k], it holds that Cs(a., fx) Z Uje 
Cs(ax, fj), where Cs(a, f) = {a; | i € Is(a, f)}. This is because otherwise 
the adversary can acquire keys for f;,..., fẹ and use the sub-keys planted 
in them to decrypt enough of the shares of M;’s and reconstruct M which 
is encrypted under the attribute a,. 


Despite the fact that the sharing-based approach is more general than the OR- 
based approach, for the case of polynomial sized sets q = poly(K), we show 
that the construction of Lemma [§] is indeed as powerful as any sharing-based 
approach: 


Lemma 9. There is a sharing based construction for the predicate system F 
using G if and only if there exists an OR-based construction. 


Note that by proving Theorem [9] we shall rule out an OR-based (and hence 
sharing-based) constructions along the way. A special case of the following com- 
binatorial lemma, Corollary [I] shows that no OR-based (nor sharing-based) 
construction of 7-TPE from IBE exists for any constant 0 < T < 1. Moreover, 
not surprisingly, we will use this lemma in our proof of Theorem [19] 


Lemma 10. Let F = A = {0,1}* denote the set of attributes and predicates for 
T-TPE for a constant 0 < T < 1. Also suppose that the following sets of size at 
most q = poly(«) are assigned to F, A, and F x A : S(a) fora € A, S(f) for 
f €F, and S(a, f) for (a, f) € Ax F. Then, there exists a sampling algorithm 
Samp that, given an input parameter € > 1/poly(K), outputs k + 1 = poly(«) 
pairs (fs, ax), (f1,a1),---, (fk ax) such that with probability at least 1 — € over 
the randomness of Samp the following holds: 

1. fx(ax) =1 and fi(a;) =1 for alli € [k] (this part holds with probability 1), 
2. filas) =0 for alli € [k], 

3. Slas) N SCF) OS (ae, fa) C Uie S (ai, fi). 


Moreover, the algorithm Samp chooses its k + 1 pairs without the knowledge of 
the set system S(-). Therefore we call Samp an oblivious sampler against the 
predicate structure of T-TPE. 


Note that although F = A, the sets S(a) for a € A and S(f) for f € F are 
potentially different even if a and f represent the same string. Intuitively, the 
set S(a) refers to the set of sub-attributes (or identities in case of using IBE as 
the black-box primitive) used during an encryption of a random message under 
the attribute a, the set S(f) refers to the set of decryption-keys planted in the 
decryption-key of f, and finally S(a, f) refers to the decryption-keys discovered 
during the decryption of the mentioned random encryption (under the attribute 
a) using the generated key for f. 
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Proof. Let A be the set of vectors in {0,1}* of normalized Hamming weight 7, 
namely A = {a|a=(a,...,a,) € {0,1}*, >>, a; =T- K}. Also let F be the set 
of vectors in {0,1}* of normalized Hamming weight 7’ = r + +54. Consider a 
bipartite graph G with nodes (A, F) and connect a € A to f € F iff f(a) =1 
according to 7-TPE (i.e., the indexes of the nonzero components of a is a subset 
of those of f). We will later use the fact that G is a regular graph (on its F side). 
For any vertex x in G let N(x) be the set of neighbors of x in the graph G. The 
covering-sampler acts as follows: Choose p = poly(«) and h = poly (x) to satisfy 
a; +7+(1—#)?) < 5 (eg., this can be done by setting h = \/p and choosing 
p large enough). Choose fx Č F at random. Choose ax, a1,... , ap = N(fx) at 
random with possible repetition from the neighbors of fx. For each i € [p], choose 
p random neighbors fi1,..., fip <= N(a;) of a; (repetition is allowed). Output the 
p° +1 pairs: (as, fe), (ai, fig icp) se lp]- 

Now we prove that with probability at least 1 — €/2 — neg(m) > 1 — e the 
output pairs have the properties specified in Lemma [0] 

Property (1) holds by construction. 

Since 0 < T < 7’ < 1 are constants, using standard probabilistic arguments 
one can easily show that the probability of fi; being connected to a, in G (i-e., 
fij (a) = 1) is neg(«) (given a,,a; are random subsets of f, a random superset 
fij of a; is exponentially unlikely to pick all the elements of a,.). Thus (2) holds. 

The challenging part is to show that (3) holds, i.e., the following: With prob- 
ability at least 1— a; + A +(1- wm”) > 1-6/2 it holds that S(a.)AS(fx) N 
S(as, fs) C UijS (ai, fij). The proof will go through several claims. 

In the following let h = \/p. For an attribute node a € A of G, define H(a) 
to be the set of “heavy” elements that with probability at least 1/h are present 
in S(a, f) for a random neighbor f of a, i.e., H(a) = {x: Prix € S(a,f) | 
f È N(a)| > 1/h}. Note that H(a) is not necessarily a subset of S(a). 


Claim. Define BE, to be the bad event “S(a.) O S(ax, fx) Z H(a,).” Then, 
Pr[BE,] < q/h. 


Proof. Since G is regular on its F side, conditioned on a fixed a, the distribution 
of f, is still uniform over N(a,). Now fix a, and fix an element b € S(a,). If 
b is not in H(a,), then over the random choice of fs S N(ax), it holds that 
Pr[b € S(ax, fx)] < 1/h. The claim follows by a union bound over the q elements 
in S(ax). 


Claim. Define BEz to be the bad event “there exists a b € S(f,) such that 
b € H(a,) but for every i € |p], b Z H(a;), i.e., S( fx) VA (ax) Z U;H(a;).” Then, 
Pr[BE2] < q/p. 


Proof. It is enough to bound BE by 1/p for a fixed b € S(f,) and the claim 
follows by union bound over the elements of S(f,.). But when b € S(f,) is fixed, 
we can pretend that a, is chosen at random from the sequence ag,...,d@p after 
they are chosen and are fixed. In that case BE2 happens if there is only a unique 
j € {0,...,p} such that b € H(a;) and a, chooses to be aj. The latter happens 
with probability at most 1/(p +1) < 1/p. 
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Claim. Define BE3 to be the bad event “given neither BE, nor BEz happens, 
S(ax) N S(fa) N Slas, fe) £ UigS(ai, fiz)” Then, Pr[BEs] < q(1 — 1/h)?. 


Proof. We assume events BE, and BE2 have not happened and perform the 
analysis. By =BE,, we have S(ax)M S(ax, f+) C H(a.). Moreover, since ~BE2 
holds, any element b € S(f,)  H(a,) will be in H(a;) for at least one i € [p]. 
Therefore for each j € |p], Pr[b € S(ai, fiz)] > 1/h holds by the definition of 
heavy sets, and thus b ¢ U;jS (a;i, fij) can hold only with probability at most 
(1 — 1/h)?. By union bound, the probability that there exists a b € S(a,) N 
S( fx) OS (as, fx) such that b ¢ Uj S (ai, fiz) is bounded by g(1 — 1/h)?. 


From Claims B] [B] and B} it follows that (3) fails with probability at most as + 
i + (1 —%)?) < $. Therefore, the sampled [as, fs, {ij }ie[p],je[p]] Will have the 
desired properties with probability at least 1 — neg(«) — €/2 which finishes the 


proof of Lemma 


Using Lemma [10] it is almost straightforward to prove the following. 


Corollary 11. For any constant 0 < T < 1, there is no OR-based (nor sharing- 
based) construction of T-TPE schemes from IBE schemes. 


4 The Communication Complexity Approach 


In this section, we show an alternative general approach to refute sharing-based 
constructions of predicate encryption schemes using separation results in two- 
party communication complexity. In particular, using conjectured separations in 
communication complexity, we prove the impossibility of a sharing-based con- 
struction of NC'-PE from AC°-PE, thus making some progress toward the 
question of separating PE schemes based on the complexity classes the underly- 
ing predicates come from. On the other hand, we are currently able to apply this 
approach only to sharing-based constructions rather than to general black-box 
constructions. 

Let (A,F) be a predicate encryption scheme. W.l.o.g. we identify A with 
{0,1}* and think of F as a family of functions {fẹ : {0,1}" > {0, l}hego1}«; 
i.e., we assume for simplicity that |F| = 2" and its members are also indexed by 
b € {0,1}*. We may abuse this notation and refer to b itself as a member of F. 
We can then talk about the communications complexity of F when b € F is given 
to Bob and a € A to Alice. We can represent this communication complexity 
problem by the {0,1}-matrix with rows indexed by A and columns by F. With 
a little more abuse of notation, we denote this matrix also by F = (fp(a@))a,, and 
refer to the communication complexity of F. Recall that the essential resource 
in communication complexity is the number of bits Alice and Bob need to com- 
municate to determine f(a). Various models such as deterministic, randomized 
(public or private coins), nondeterministic, etc., communication complexity can 
be defined naturally. For details on such models, we refer to the classic book by 
Kushilevitz and Nisan [22], the paper by Babai et al. [I], and the surveys by 
Lokam [26] and Lee and Shraibman [23]. 
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To connect communication complexity to OR-based constructions using IBE, 
we use the model of Merlin-Arthur (MA) games in communication complexity: 


Definition 12 (Merlin-Arthur Protocols [21]). A matrix F is said to have 
an MA-protocol of complexity L+ c if there exists a c-bit randomized public-coin 
verification protocol II between Alice and Bob such that 

— F(a,b) = 1 > Iw € {0,1} Pr[H((a, w), (b, w)) = 1] > 2/3, 

— F(a, b) = 0 > Vw € {0,1} Pr[H((a, w), (b, w)) = 1] < 1/3. 


The MA-complexity of F, denoted MA(F), is the minimum complexity of an MA 
protocol for the matrix F. 


With this definition, the well-known fact (see, for example, [22]) that EQUALITY 
has public coin randomized communication complexity of O(1), and our Defini- 
tion [7] of OR-construction, the following lemma is easy. 


Lemma 13. Suppose there is an OR-based construction of a predicate encryp- 
tion scheme (A,F) using an IBE scheme (B,G). Then MA(F) = O(log K). 


Using a result due to Klauck that MA(DISJOINTNESS) = 92(,/«), we can 
show. 


Theorem 14. For some constant0 < T < 1, e.g., T = 1/3, there is no OR-based 
(and hence no sharing-based) construction of a T-TPE scheme from IBE. 


To derive separations among stronger predicate encryption schemes based on 
sharing constructions, we need to recall definitions of languages and complex- 
ity classes in two-party communication complexity, in particular, PH and 
PSPACE“. 

Complexity classes in two-party communication complexity are defined in 
terms of languages consisting of pairs of strings (a,b) such that |a| = |b|. Denote 
by {0,1}°* the universe {(a,b) : a,b € {0,1}* and |a| = |b|}. For a language 
L C {0,1}**, we denote its characteristic function on pairs of strings of length « 
by L,. The language Lẹ is naturally represented as a 2" x 2" matrix with {0,1} 
or +1 entries. 


Definition 15. Let 1i(k),...,la(K) be nonnegative integers such that l(k) := 
SL L(K) < (logK)° for a fixed constant c > 0. A language L C {0,1}?* 
is in US if there exist lı(k),...,la(K) as above and Boolean functions p, : 
{0,1} —+ {0,1} such that (a,b) € Lẹ if and only if Juı Yuz... Qaua 
(y(a, u) > y(b,u)), where |u;| = Li(k),u = u1... ua, Qa is V for d even and 
is 3J for d odd, and, > stands for V if d is even and for ^ if d is odd. 


— By allowing a bounded number of alternating quantifiers, we get an analog 
of the polynomial time hierarchy: PH = Ugso EG- 

— By allowing an unbounded, but at most polylog(«) alternating quantifiers, 
we get an analog of PSPACE: PSPACE™ = U.59 Ua<(og nye Ba: 


The following lemma shows a connection between the communication complexity 
class PH“ and OR-based constructions using AC°-predicate encryption. 
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Lemma 16. Suppose a predicate encryption scheme (A,F) is obtained by an 
OR-based construction using an AC? -predicate encryption scheme. Then the lan- 
guage given by the sequence of matrices {F},, is in PH”. 


Proof. By hypothesis, for a given fẹ € F, we have AC® circuits yjy,.. 5 Pab 


and for a given a € A, we have aja,...,Qga such that fela) = Vi jpiblQja). 
Knowing f,, Bob can compute the circuit Co(z) = Vij Piy(zj), where z = 
(z1,---,2q), || = la;|. Knowing a, Alice can compute ag = (Q1a,..-,Qga) 


on which Cy needs to be evaluated.We give a protocol with a bounded number 
of alternations for F. Let the depth of Cy be d (including the top OR-gate). An 
existential player will have a move for an OR gate in Cy and a universal player 
will have a move for an AND gate. Their d moves will describe an accepting path 
in Cp on aq. For example, assuming AND and OR gates alternate in successive 
layers, SwiVwe-+:Qawa y(Co, Wi,---,Wa)(Qa) describes a path in Cy — start 
with the top OR gate and follow the wire wı to the AND gate below and then 
the wire w2 from this gate and so on — ending in a gate y := ¥(...) to witness 
the claim that f,(a) = 1. Since Bob knows Cp, he can verify the correctness 
of the path w,w2---w,x in the circuit and the type of the gate y given by the 
path. He then sends the labels of the inputs and the type (AND or OR) of the 
gate to Alice, who responds with y(a,). Bob can verify that this will ensure 
Cy(@a) = 1. On the other hand, if Cy(aq) = 0, then it is easy to see that the 
existential player will not have a winning strategy to pass verification protocol 
of Alice and Bob on their inputs a and C%. It follows that F has a protocol with 
at most d alternations and hence {F},, € PH. 


This lemma enables us to show the impossibility of OR-based constructions of 
predicate encryption schemes using AC°-predicate encryption. In particular, 


Theorem 17. Suppose PHS + PSPACE™. Then, there is no OR-based con- 
struction of an NC!-PE scheme from any AC°-PE scheme. In particular, there 
is an NC!-function family F (derived from so-called Sipser functions [33]) such 
that (A,F) does not have an OR-based construction from any AC°-PE scheme. 


However, it is a longstanding open question in communication complexity to 
separate PSPACE® from PH. Currently it is known that such a separation 
holds if certain Boolean matrices can be shown to have high rigidity, a connection 


explained in [29J25}. 


Corollary 18. Suppose Hadamard matrices are as highly rigid as demanded 
in (2925). Then, predicate encryption defined by the parity functions (arising 
from Inner Product mod 2 matrix) does not have an OR-based construction from 
any AC? -predicate encryption scheme. 


5 Separating TPE from IBE 


In this section, we prove that there is no general black-box construction of thresh- 
old predicate encryption schemes from identity-based encryption schemes. 
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Theorem 19. Let « €N be the security parameter. Then, there exists an oracle 
O relative to which CCA secure IBE schemes exist, as per Definition[S| However, 
for any constant 0 < T < 1, there exists a query-efficient (i.e., that makes at 
most poly(k) queries to O) adversary Adv that can break even the CPA security 
of any T-TPE scheme relative to O, again as per Definition[3| Moreover, Adv 
can be implemented in poly(k)-time if given access to a PSPACE oracle, and 
its success probability can be made arbitrarily close to the completeness of the 
T-TPE scheme. 


We will first define our random IBE oracle, Org, also denoted by O for short, 
(which trivially implies a CCA secure IBE as outlined in Remark PIJ, and then 
break any t-TPE (with a constant T) relative to this oracle. 


Construction 20 (Randomized oracle O = (g,k,id,e,d)). By Oy we refer 
to the part of O whose answers are A bits, and O is the union of O, for all À. 


— The master-key generating oracle g : {0,1}* ++ {0,1}> is a random permu- 
tation that takes as input a secret-key sk € {0,1}*, and returns a public-key 
pk € {0,1}. 

— The decryption-key generating oracle k : {0,1}?* — {0,1} takes as input a 
secret-key sk € {0,1}* and an identity a € {0,1}*, and returns a decryption- 
key dka € {0,1}*. We require k(sk,-) to be a random permutation over 
{0,1} for every sk € {0,1}. 

— The identity finding oracle id : {0,1}?4 ++ {0,1} takes as input a public-key 
pk € {0,1} and a decryption-key dk € {0,1}*, and returns the unique a 
such that k(sk,v) = dk, where sk = g~1(pk). 

— The encryption oracle e : {0,1} ++ {0,1}* takes as input a public-key 
pk € {0,1}*, an identity a € {0,1}* and a message m € {0,1}, and returns 
a ciphertext c € {0,1}. We require e(pk,a,-) to be a random permutation 
over {0,1} for every (pk, a) € {0,1}. 

— The decryption oracle d : {0,1}°* + {0,1}* takes as input a public-key 
pk € {0,1}, a decryption-key dk € {0,1}* and a ciphertext c € {0,1}>, and 
returns the unique m such that e(pk, a,m) = c, where a = id(pk, dk). 


By an IBE oracle, we refer to an oracle in the support set of O, Supp(Q), and 
by a partial IBE oracle we refer to a partial oracle that could be extended to an 
oracle in Supp(Q). 


Remark 21 (CCA secure IBE relative to O). To encrypt a bit b € {0, 1} under iden- 
tity aœ and public-key pk, the encryption algorithm extends b to a A-bit random 
string: m = (b,b1,...,by—1),bi<~{0, 1} and gets the encryption c = e(pk, a, m). To 
decrypt, we decrypt cand output its first bit. By independently encrypting the bits 
ofa message m = (m1,..., Mn), with n = poly(«), and using a standard hybrid ar- 
gument, one can generalize the scheme to arbitrarily long messages. This construc- 
tion is only CPA secure, where any adversary has advantage at most 27°). But, 
this can easily be transformed in a blackbox manner into a CCA secure construc- 
tion, without incurring any additional assumptions, using the Fujisaki-Okamoto 
transform in the random oracle model [4]. We note that even though O is not 
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exactly a random oracle, for our purposes it suffices to use one of the sub-oracles of 
O as a random oracle in the above transform. 

Now we present an attack that aims to break any 7-TPE in an O-relativized world 
by asking only poly(«) queries to the random IBE oracle O, where « is the security 
parameter of the 7-TPE scheme. We prove the query-efficiency and the success 
probability of our attack in the full version [16]. Similar to the attack of [9], our 
attack can easily be implemented in poly(«)-time if P = PSPACH}|, and the 
relativizing reductions can be ruled out by adding a PSPACE oracle to O. 

We first note that any black-box construction of t-TPE schemes from IBE 
schemes can potentially call the oracle O, over different values of A which are 
potentially different from the security parameter of the 7-TPE scheme itself. 
However, similar to [9], we assume that the r-TPE scheme asks its queries to O) 
only for one value of A. This assumption is purely to simplify our presentation 
of the attack and its analysis, and all the arguments below extend to the general 
case (of asking queries over any parameter A > log s) in a straightforward way. 

We also assume that A is large enough in the sense that 2* > s for an ar- 
bitrarily large s = poly(«) that can be chosen in the description of the attack. 
The reason for the latter assumption is that the adversary can always ask and 
learn all the oracle queries to O that are of logarithmic length O(A) = O(log ), 
simply because there are at most 2°) = poly(«) many queries of this form 
Construction 22 (Adv Attacking the Scheme 7-TPEÎ). The parameters 
are as follows. q: the total number of queries asked by the components of the 
scheme T-TPE all together, K: the security parameter of r-TPE, € = 1/ poly(«) 
and s = poly(k): input parameter to the adversary Adv, A < poly(k): the 
parameter which determines the output length of the queries asked by the compo- 
nents of T-TPE to the oracle O. It is assumed that 2 > s for some s = poly(k) 
to be chosen later. Our adversary Adv executes the following. 


1. Sampling Predicates and Attributes: Adv executes the sampling algo- 
rithm Samp of Lemma] with the parameter €, over the predicate structure 
of T-TPE, to getk+1 pairs (ax, fx), {(ai, fi) fica). Recall that this sampling 
is done only by knowing the predicate structure of T-TPE and is indepen- 
dent of the actual implementation of the scheme. It can be done, for example, 
without the knowledge of PK. 

2. Receiving the Keys: Adv receives from the challenger: the public-key PK 
and the decryption-keys {DK;}ie,x, where DK; is the generated decryption- 
key for fi. We also assume that DK, is generated by the challenger, although 
Adv does not receive it. Let V be the view of the algorithms executed by the 
challenger so far that generated the keys PK, DK,,DKi,...,DK,z. Let Q(V) 
be the partial oracle consisting of the queries (and their answers) specified 
in V. By writing in the bold font V, we refer to V as a random variable. 

3. Encrypting Random Bits: For alli € [k], Adv chooses a random bit de 
{0,1}, computes the encryption Ci + E(PK, ai, d), and then the decryption 


1 A good “approximation” of the attack can also be implemented assuming P = NP. 
2 In [9] a scheme that asks such queries is called “degenerate” and is handled similarly. 
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D(PK, DK;, C;). Let Lo be the partial oracle consisting of the oracle queries 
(and their answers) that Adv observes in this step. 

4. Learning Heavy Queries: This step consists of some internal rounds. For 
j =1,2,... do the following. Let Lj be the partial oracle consisting of the 
oracle queries (and their answers) that Adv has learned about O till the end 
of the j ’th round] of this learning step. Let Vj = (V | Lj, PK, {DK}icj]) be 
the distribution of the random variable V (also including the randomness of 
O) conditioned on the knowledge of (Lj, PK, {DK}icj]). For a partial oracle 
P, let P denote its closurd. Now, if there is any query x such that x ¢ Lj 
but Prix € Q(V;)| > €, Adv asks the lexicographically first such x from the 
oracle O, sets Lj41 = £L;U(x,O(x)), and goes to round j+1. In other words, 
as long as there is any new query x that is e-heavy to be in the closure of the 
queries of the view of the key-generations, Adv asks such a query x. If no 
such query exists, Adv breaks the loop and goes to the next step. 

(Note that the above and the following steps may require a PSPACE- 
complete oracle to be implemented efficiently.) 

5. Guessing Challenger’s View: Let L be the partial oracle consisting of the 
oracle queries (and their answers) that Adv learned in Steps[3 and[}] (i.e., 
L = Ly, where Q(Ve) had no €-heavy queries to be learned). Let Venal = 
(V | £,PK,{DK;i}iefay), and sample V’ <= Vena. Let SK’ and DK’, be in 
order, the “guessed” values for the secret-key and the decryption-key of fx 
determined by the sampled V'. We note that by definition the other keys 
PK’, {DKj}ie[x) determined by V’ are the same as the ones that Adv has 
received: PK, {DK;}iejxy- 

6. Receiving the Challenge and the Final Decryption: Adv receives 
C,.(= E° (PK, ax, b)) for a random bitb € {0,1}. Then, Adv uses the oracle 
O' defined below and outputs the decrypted value b! + D®'(PK, DK’,,C,) as 
his guess about the bit b. 

The Oracle O': At the beginning of the decryption of Step[6\ the partially 
defined oracle O' is equal to LUQ(V’), namely the learned queries (and their 
answers) together with the guessed ones specified in V'. Afterwards, if a new 
query x is asked: (i) if x € O', return O'(x), otherwise (ii) if x € O', then 
return y = O'(x) and add (x,y) to O', and finally (iii) if x ¢ O', ask x from 
O and add (%,O(x)) to O’. 


This finishes the description of our attack. We prove the query-efficiency and 
the success probability of our attack in the full version [16]. 
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3 Step B]can be thought of as the 0’th round. 
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partial oracle (or its closure), e.g. if the partial oracle contains queries [g(sk) = pk] 
and [k(sk,a) = dk] then its closure must also contain the query [id(pk,dk) = a]. 
Please refer to the full version for a formal definition. 
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Abstract. We consider the problem of amplifying the “lossiness” of 
functions. We say that an oracle circuit C* : {0, 1}™ > {0,1}* amplifies 
relative lossiness from £/n to L/m if for every function f : {0,1}" > 
{0,1}” it holds that 

1. If f is injective then so is CY. 

2. If f has image size of at most 2”~*, then Cf has image size at most 

ot 
The question is whether such C* exists for L/m >> /n. This problem 
arises naturally in the context of cryptographic “lossy functions,” where 
the relative lossiness is the key parameter. 
We show that for every circuit C* that makes at most t queries to 

f, the relative lossiness of Cf is at most L/m < €/n + O(logt)/n. In 
particular, no black-box method making a polynomial t = poly(n) num- 
ber of queries can amplify relative lossiness by more than an O(log n)/n 
additive term. We show that this is tight by giving a simple construction 
(cascading with some randomization) that achieves such amplification. 


1 Introduction 


Lossy trapdoor functions, introduced by Peikert and Waters [14], are a powerful 
cryptographic primitive. Soon after their introduction, they were found to be 
useful for realizing new constructions of traditional cryptographic concepts, as 
well as for demonstrating the feasibility of new ones. Their wide applicability, 
simple definition, and realizability under a variety of cryptographic assumptions 
make them a clear candidate for induction into the “pantheon” of cryptographic 
primitives. 


1.1 Lossy Trapdoor Functions 


A collection of lossy trapdoor functions consists of two families of functions. 
Functions in the first family are injective (and can be inverted using a trapdoor), 
whereas functions in the second are “lossy,” meaning that the size of their image 
is significantly smaller than the size of their domain. The security requirement 
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is that the description of a function sampled from the injective family is compu- 
tationally indistinguishable from the description of a function sampled from the 
lossy family. 

As demonstrated by Peikert and Waters, lossy trapdoor functions imply 
primitives such as trapdoor functions, collision-resistant hash functions, and 
oblivious transfer [14]. Amongst “higher level” applications, we can find chosen- 
ciphertext secure public-key encryption [14], deterministic public-key encryp- 
tion [4], OAEP-based public-key encryption , “hedged” public-key encryption 
for protecting against bad randomness [2], security against selective opening at- 
tacks [8], and non-interactive universally-composable string commitments [13] 


1.2 Relative Lossiness 


A key parameter in all the applications of lossy trapdoor functions is the amount 
of lossiness guaranteed in case that a lossy function was sampled. We say that 
a function f : {0,1}" — {0,1}” is (n, €)-lossy if its image size is at most 2"~*. 
Intuitively, this means that an application of f on an input x € {0,1}” loses at 
least £ bits of information, on average, about x. We refer to Z as the absolute 
lossiness of the function and to /n as the relative lossiness of the function. 

Peikert and Waters showed how to obtain chosen ciphertext secure encryp- 
tion assuming relative lossiness €/n = (1). This was subsequently improved by 
Mol and Yilek who, building on work by Rosen and Segev [16], demonstrated 
how to obtain the same result assuming relative lossiness of only 1/poly(n). One- 
way functions and similarly trapdoor functions and oblivious transfer, can be 
constructed assuming relative lossiness of 1/poly(n). Collision resistant hashing 
requires relative lossiness of at least 1/2 + 1/poly(n). All other known applica- 
tions of lossy trapdoor functions currently assume relative lossiness that is at 
least as large as 1 — o(1). 

Currently, relative lossiness of 1 — o(1) seems to be necessary for most “non- 
traditional” applications of lossy trapdoor functions. While some of the known 
instantiations are able to guarantee such a high rate of lossiness, some other 
constructions fall short. Most notably, the lattice-based construction of Peikert 
and Waters [I4], which is the only one based on a worst-case assumption and the 
only one for which no sub-exponential attack is known, only guarantees relative 
lossiness of §2(1). 

High relative lossiness is also relevant for applications that do not necessitate 
it. This is because the lossiness rate typically has a pronounced effect on the 
efficiency of the resulting construction. Specifically, higher lossiness rate enables 
the use of a smaller security parameter, and in many applications also enables 
the extraction of a larger number of “information theoretic” hard-core bits from 
the underlying function. This is useful, for example, for efficiently handling long 
messages. 


' We note that for some of these constructions (e.g., collision-resistant hashing) the 
existence of a trapdoor is not required. 
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1.3 Lossiness Amplification 


All of the above leads to the question of whether, given a specific construction of 
lossy trapdoor functions, it is possible to apply an efficient transformation that 
would result in a construction with significantly higher lossiness. It can be easily 
seen that parallel evaluation of t independent copies of an (n, £)-lossy function 
amplifies the absolute lossiness from ¢ to tl. Specifically, given an (n, £)-lossy 
function f : {0,1}" > {0,1}” the function g : {0,1} > {0,1}, defined as 


CC eet £t) = (Fr), 23 5 f(21)) 


is (tn, t0)-lossy. However, this comes at the cost of blowing up the input size by 
a factor of t and hence leaves the relative lossiness ¢/n unchanged. What we 
are really looking for is a construction of a (m, L)-lossy function h : {0,1}” > 
{0,1} where L/m >> ¢/n. A natural candidate is sequential evaluation (also 
known as “cascading” ), defined as 


h(x) = FOC.. FEE) 


t times 


Unfortunately, in general h might not be more lossy than f. In particular, this 
is the case when f is injective on its own range. One can do a bit better though. 
By shuffling the outputs in-between every invocation, using randomly chosen 
T1,..., T4, one obtains the function 


Pry sre (x) = FFG : TD) © rı) © r2) raD ri), 


for which it is possible to show that, if f is say (n, 1)-lossy, then with overwhelm- 
ing probability over the choice of r1,...,r¢, the function hri,...r, has relative 
lossiness of §2(log t)/n. 

While already not entirely trivial, relative lossiness of (2(logt)/n is a fairly 
modest improvement over 2(1)/n, and would certainly not be considered suffi- 
cient for most applications. Still, it is not a-priori inconceivable that there exists 
more sophisticated ways to manipulate f so that the relative lossiness is am- 
plified in a more significant manner. In this paper, we show that an additive 
gain of O(log n)/n is actually the best one can hope for, at least with respect to 
black-box constructions. 


Pec 


1.4 Our Results 


We show that no efficient black-box amplification method can additively improve 
the relative lossiness of a given function f by more than O(log n)/n. To this end, 
we consider a circuit C* : {0,1} — {0,1}* with oracle access to a function 
f :{0,1}" > {0,1}” such that the following hold: 


1. If f is injective then so is CY. 
2. If f has image size of at most 2"~*, then Cf has image size at most 2""~”. 
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Our main result is that, if £ < n —w(logn), then for every C* that makes at 
most t queries to f, the relative lossiness, L/m, of C/ is at most (¢+ O(log t))/n. 
The impossibility result holds regardless of whether the injective mode of f has 
a trapdoor, and rules out even probabilistic constructions C* (i.e., ones which 
amplify lossiness only with high probability over the choice of some randomness). 
In Section 2] we provide a high-level overview of our approach, and in Section 
[3] we formally present our proof. We then show (in Section [4) how to extend 
the above result to a “full fledged” cryptographic setting, in which one does 
not simply get black-box access to a single lossy or injective function f. In this 
setting, lossy functions are defined by a triple of algorithms {go, g1, f}, where 
one requires that a function fp is injective if the key is sampled by k + gy, and 
lossy if the key is sampled by k + go. Moreover, the distributions generated by 
the injective and lossy key generation algorithms go, gı must be computationally 
indistinguishable. 


1.5 Relation to the Collision Problem 


Closely related to our setting is the collision problem, in which one is given 
black-box access to a function f : {0,1}" — {0,1}” and is required to distin- 
guish between the case that f is injective and the case that it is 2“-to-1. A simple 
argument shows that any (randomized) classical algorithm that tries to distin- 
guish between the cases must make 2(2‘—9/?) calls to f. Kutin [I], extending 
work of Aaronson and Shi [I], proves an analogous bound of 2(2"—®/*) in the 
quantum setting. 

Lower bounds on the collision problem can be seen to directly imply a weak 
version of our results. Specifically, if non-trivial lossiness amplification were pos- 
sible then one could have applied it, and then invoked known upper bounds 
for the collision problem (either O(2~—9/?) randomized classical or O(2("—/3) 
quantum), resulting in a violation of the corresponding lower bounds. However, 
this approach will only work if the amplification circuit does not blow up f’s 
input size (specifically, only ifm < n+ (L -— £)). In contrast, our results also hold 
with respect to arbitrary input blow-up. 


1.6 Related Work 


Several instantiations of lossy trapdoor functions guarantee relative lossiness of 
1—o(1). Peikert and Waters present constructions based on the Decisional Diffie- 
Hellman assumption [14]. These are further simplified by Freeman et al, who also 
present a generalization based on the d-linear assumption [6]. Boldyreva et al. 
[4], and independently Freeman et al. [6], present a direct construction based on 
Paillier’s Composite Residuosity assumption. 

Hemenway and Ostrovsky [7| generalize the approach of Peikert and Waters, 
and obtain relative lossiness of 1—o(1) from any homomorphic hash proof system 
(a natural variant of hash proof systems [5]). In turn, this implies a unified 
construction based on either Decisional Diffie Hellman, Quadratic Residuosity, 
or Paillier’s Composite Residuosity assumptions. 
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Constructions with relative lossiness (1) are known based on the hardness of 
the “learning with errors” problem, which is implied by the worst case hardness 
of various lattice problems [I4]. Kiltz et al. argue that RSA with exponent e 
satisfies relative lossiness (log e)/n under the phi-hiding assumption, and that 
use of multi-prime RSA increases relative lossiness up to (mloge)/n where m 
is the number of prime factors of the modulus [IO]. Finally, Freeman et al. [6] 
propose an instantiation based on the Quadratic Residuosity assumption with 
relative lossiness of (2(1/n). 


1.7 On Black-Box Separations 


The use of black-box separations between cryptographic primitives was pioneered 
by Impagliazzo and Rudich [9], who proved that there is no black-box con- 
struction of a key-exchange protocol from a one-way permutation. Since then, 
black-box separations have become the standard tool for demonstrating such 
assertions. We note that our main result is “unconditional”, in the sense that it 
holds regardless of any cryptographic assumption. Our “cryptographic” result, 
in contrast, is more standard in that it relies on the indistinguishability property 
of lossy functions (see the work of Reingold et al. [I5] for an extensive discussion 
on black-box separations). 

Strictly speaking, it is not clear whether black-box separations should be 
interpreted as strong impossibility results. Certainly not as long as non-black- 
box techniques are still conceivable. Nevertheless, since as far as we know any of 
the primitives could exist unconditionally (cf. [8]), it is currently not clear how 
else one could have gone about proving cryptographic lower bounds . In addition, 
most of the known construction and reductions in cryptography are black-box. 
Knowing that no such technique can be used to establish an implication serves 
as a good guideline when searching for a solution. Indeed, it would be extremely 
interesting to see if non-black box techniques are applicable in the context of 
lossy function amplification. 


2 Overview of Our Approach 


We say that a function f : {0,1}" — {0,1} is (n, -lossy if its image {f (x) : 
x € {0,1}"} has size at most 2”~*. We refer to £ as the absolute lossiness, and 
L/n as the relative lossiness of f. An (n, £)-lossy function f is balanced if f(x) 
has exactly 2° preimages for every x € {0,1}, ie. {z : f(z) = f(x)}| = 2°. We 
denote with Fn, the set of all balanced (n, ¢)-lossy functions. 


Definition 2.1 (Lossiness amplification). We say that an oracle circuit C* : 
{0,1} — {0,1} amplifies the relative lossiness from ¢/n to L/m if 


1. for every injective function fo over {0,1}", CP is injective. 
2. for every fı : {0,1}" — {0,1}” with image size 2"-*, the image of Cf has 
size at most 27-", 
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We say C* weakly amplifies if C* is probabilistic and the second item above only 
holds with probability > 0.9 over the choice of C*’s randomness. 


Remark 2.2 (Permutations vs. injective functions). In order to make our 
negative result as strong and general as possible, we require the oracle to be length 
preserving (and thus the injective fo is a permutation), whereas the input and 
output domain of C* can be arbitrary. 


For concreteness, in this proof sketch we only consider the case £ = 1. We will 
also assume that m = nk is an integer multiple of n. The basic idea of our proof 
is to show that for any C*, property 1. of Definition EI] implies that C has 
very low collision probability if fı € Fn,ı is a randomly chosen 2-1 function. 
More concretely, let t denote the number of oracle gates in C* and assume we 
could prove that 


P f(x) =chiy < 97k n+O(k logt) 1 
cx Ph yal) = Cf) < (1 


Such a low collision probability implies that C^ must have a large range and thus 
cannot be too lossy. In particular, Eq. (i) implies that the absolute lossiness of 
CF: is at most O(k logt), or equivalently, the relative lossiness is O(k log t)/kn = 
O(log t)/n, which matches (ignoring the constant hidden in the big-oh) the lossi- 
ness of the construction hr... r, from Section [I.3] Unfortunately Eq. (I) is not 
quite true. For example consider a circuit C* : {0,1}*" > {0,1}*" which makes 
only t = 2 queries to its oracle and is defined as 


x im) { ; oe if f(v1) = f(x) and z1 Æ x2 


Cl (£1, £2,... 4 
(z1, T2, T1, T2,- -, Zk) otherwise 


If fo : {0,1}” > {0,1}” is a permutation, so is Cf (in fact, it’s the identity 
function), thus property 1. holds. On the other hand, for any (n,1)-lossy fı 
we have fi(a1) = fi(w2) and xı # x2 with probability 27” for uniform z1, x2. 
Thus the probability that C/ outputs 0%” on a random input is also 27”, which 
implies 


P Ch(X) = Č} (Y) > P Ch(X) = Č} (Y) = 0% 
PRM (X) (Y)] Tae gall (X)=C"(Y) =0" | 
> Q-2 


contradicting Eq. (1) for k > 2. 

The idea behind the counterexample C/ is to query f on two random inputs 
and check if f collides on these inputs. If this is the case, Cf “knows” that f 
is not a permutation and so it must not be a permutation itself as required by 
property 1, in this case mapping to some fixed output. Although Eq. (J) is wrong, 
we can prove a slightly weaker statement, where we exclude inputs X where the 
evaluation of Cf on X involves two invocations of f on inputs x Æ x’ where 
f(x) = f(2’) (we will call such bad inputs “burned” ). As with high probability, 
for a random (n,1)-lossy f, most inputs are not burned, already this weaker 
statement implies that Cf has large range. 
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The cryptographic setting. In a cryptographic setting, one usually does not 
simply get black-box access to a single lossy or injective function f, but lossy 
functions are defined by a collection (indexed by a security parameter À) of 
triples of algorithms {g0, 91, f};xen}, where one requires that f(k,-) is injective 
if the key is sampled by k + gı, and lossy if the key is sampled by k + go. 
Moreover the distributions generated by the injective and lossy key generation 
algorithms go, gı must be computationally indistinguishable. 

In this setting one can potentially do more sophisticated amplification than 
what is captured by Definition P.I] e.g. by somehow using the key-generation 
algorithms go, g1. In Section [4] we prove that black-box lossiness amplification is 
not possible in this setting either. 

In a nutshell, we show that constructions which amplify collections of lossy 
functions can be classified in two classes depending on whether the lossiness of 
the construction depends only on the lossiness of the oracle (we call such ampli- 
fiers “non-communicating” ) or if the property of being lossy is somehow encoded 
into the key. In the first case, the proof goes along the lines of the proof of The- 
orem B.I] (in particular, amplifiers as in Definition P.Iļ]are “non-communicating” 
as there’s not even a key). In the second case, where the construction is “com- 
municating”, we show that the output of the key-generation algorithms (of the 
amplified construction) will not always be indistinguishable. This proof borrows 
ideas from the work of Impagliazzo and Rudich [9] who show that one cannot 
construct a key-agreement from one-way permutations. Their proof shows that 
for any two parties Alice and Bob who can communicate over a public channel 
and who have access to random oracle R, there exists an adversary Eve who can 
with high probability make all queries to R that both, Alice and Bob, made. 
As a consequence, Alice and Bob cannot use R to “secretly” communicate. In 
a similar vein we show that the lossy key-generation algorithm cannot “commu- 
nicate” the fact that the key it outputs is lossy to the evaluation function or we 
can catch it, and thus distinguish lossy from injective keys. 


3 An Upper Bound on Black-Box Lossiness Amplification 


We now state our main theorem, asserting that simple sequential composition is 
basically the best black-box amplification that can be achieved. 


Theorem 3.1 (Impossibility of Black-Box Amplification). Consider any 
n,l,t € N where 
n>l+2logt+2 (2) 


and any oracle aided circuit C* : {0,1}™ > {0,1} which makes t oracle queries 
per invocation, then the following holds: If C* weakly amplifies relative lossiness 
from £l/n to L/nE then L < +3 logt+4. More concretely, for a random f € Fn, 
the construction CF will have relative lossiness less than (l+ 3logt + 4)/n with 
probability at least 1/2. 


? Note that we denote the relative lossiness of C* by L/n, not L/m like in the previous 
sections. In particular, the absolute lossiness of C* is Lm/n (not L). 
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Remark 3.2. The bound n > L+ 2logt+ 2 is basically tight, as for n = £ + 
2logt — O(1) one can with constant advantage p distinguish any (n, @)-lossy 
function from an injective one by simply making t random queries and looking 
for a collision. The exact value of p depends on the O(1) term, in particular, 
replacing the O(1) with a sufficiently large constant we get a p > .9 as required 
by Definition ZT Then Cf (x) which outputs x if no such collision is found, and 
some fixed value (say om) otherwise is a weak amplifier as in Definition BI] 


Remark 3.3 (Probabilistic C* vs. random f). Instead of considering a 
probabilistic C* and constructing a particular lossy f such that C is not too 
lossy with high probability over C* ’s randomness (as required by Definition Z.I), 
we consider a deterministic C* and show that CY fails to be lossy with high proba- 
bility for a randomly chosen f. As f is sampled independently of (the description 


of) C*, the latter implies the former. 


Below we formally define what we mean by an input being burned as already 
outlined. 


Definition 3.4 (Burned input). For X € {0,1}, we denote with in(X) and 
out(X) the inputs and outputs of the t invocations of f in an evaluation of 
CI(X). Consider an input X € {0,1}™ and let {x1,...,a4} < in(X), we say 
that X is burned if for some 1 < i< j <t, a; Ax; and f(x) = f(z;). o(X) 
denotes the event that X is burned. 


Below is the main technical Lemma which we will use to prove Theorem 
(recall that m = nk). 


Lemma 3.5. For a random balanced (n,¢)-lossy function f, and two random 
inputs X,Y, the probability that X,Y are colliding inputs for C? and at the 
same time both are not burned can be upper bounded as 

Bt [(C4(X) = CF (Y)) A(X) AaG(VY)] < 27G (3) 

Greist 

We postpone the proof of this Lemma to Section [B.I] The following simple claim 
upper bounds the probability (over the choice of f € Fn,e) that an input x to 
Of is burned 


Claim 3.6. For any x € {0,1}™ 


2642 
P < 
(Et BOS Fz 


(4) 


Proof. For i € {1,...,t}, the probability that the ith query to f made during 
the evaluation of Cf (x) provides a collision for f (assuming there’s been no 
(i=1)(2*-1) 

it 


collision so far) is at most To see this, note that as f is balanced, 


there are exactly (i — 1)(2f — 1) possible inputs which will lead to a collision 
as each of the (i — 1) queries we did so far has 2 — 1 other preimages. As f is 
random, the probability the ith query (for which there are 2” — i — 1 choices) 
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co. The claim follows by taking the union 


will hit one of these values is ~37> 


bound over all 7 


p= L £42 
Pr posp SE 


The second step above used t < 2”/2 which is implied by Eq. B). E 


Proof of Theorem Consider a C* as in the statement of the theorem and 
a random f € Fn. Let & E {x € {0,1} : ¢(x)} denote the set of inputs 
which are burned (cf. Definition B.4) and = {0,1} \ 2. Using the chain rule, 
we can state eq.(3) as 


9—kn+k(3 log t+) 


Pr [C/(X) = CEY] < < 5 
ar (X) Piss. ne (ESE (5) 
X, Ye? X,Y €{0,1}™ 


Using eq. (4) we can bound the expected size (over the choice of f € Fn,e) of 8 
as 


= ni. n 2” — 9642 
Ele =o Br POs SE = 2 
XE{0,1}™ 
Using the Markov inequality and eq.(2), this implies that ® is not too big, say 
at most half of the domain {0,1}, with probability 1/2 
Pr [8] 227] = Pr [|8] > 27-128 *E[ pI 
fEF nie fEF nie 


12 ere t 


AN IA 


1/2 


By the above equation, |®| > 2°~! with probability > 1 /2 over the choice of f, 
and for such a “good” f, two random X,Y are in ® with probability at least 
(1/2)? = 1/4. Thus the denominator on the right side of eq.) is at least 1/8, 
replacing the denominator in eq.(5) with 27? = 1/8 we get 
Pr [cf (xX) = cF (Y) < go kntk(3 log t+£)+3 (6) 
fEFn 
x reom 
Again using Markov, this means that for a randomly chosen f € Fn,e, with 
probability at least 1/2 
P CF(X) = OF (Y)) < 27 Kn+k(3 log t+£)+4 7 
erii p s (7) 
As two values sampled independently from a distribution with support of size u 
collide with probability at least 1/u (this is tight if the distribution is flat), eq. (Z) 
implies that the range of Cf must be at least of size 2*"—*Glest+9—4. thus the 
relative lossiness (recall that m = nk) is (k€+k3logt+4)/kn < (€+3logt+4)/n. 
E 
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3.1 Proof of Lemma 


We consider a random experiment denoted I where Xo, Yo € {0,1}™” and 
f € Fn are chosen at random, and then C (Xo) and Cf(Yo) are evaluated. 
This evaluations result in 2t invocations of f. Let {x£1,..., x+} < in(Xo) and 
{X1,..., Xt} < out(Xo) denote the inputs and outputs of f in the evaluation 
of Cf (Xo). Analogously we define values y;, Y; occurring in the evaluation of 
OF (Yo). For I C {1,...t}, we define an event Er which holds if for every i € I 
(and only for such i), there exits a j such that y; 4 z; and f(y;) = f(#;) and 
Yi # Yr for all k < i (i.e. we have a fresh, non-trivial collision). I defines a 
“transcript” 


def 
Vf, Xo,Yo = {Xo, Yo, 21,- ass Lt, f(£1),---, dd Gade Y1- -3 Yt FY) -3 Tet 


The values x; and y; in the transcript are redundant, i.e., they can be computed 
from values Xo, Yo, f (xi) and f(yi), and only are added for convenience. For 
I C {1,...,t} we define V; as all transcripts where (1) both inputs are not 
burned (2) we have a collision and (3) Er holds, i.e. 


Vi © {vf xo Yo © (Xo) A 7G(¥) A (Cf (Xo) = Cf (Yo)) A Er} 
Vol is the union of all Vz, i.e. 
Veot = UrVi = {vp,X0,% + 76(X0) A7G(Yo) A CF (Xo) = Cf (Yo)} (8) 


For a set of transcripts V, we denote with Prr[V] the probability that the tran- 
script generated by I’ is in V. It is not hard to sed] that Prr[Vg] < 27"*, we 
prove that this bound (up to a factor 2) holds for any Vy. 


Lemma 3.7. For any I C {1,...,t} we have (recall that m = nk) 


< —nk+1 
Pr{Vi] < 2 


We postpone the proof of this main technical lemma and first prove how it implies 
Theorem [B.I] But let us here give some intuition as to why Lemma [B.7] holds. 
The experiment I" generates a transcript in Vy if (besides Cf(X 9) = CF (Yo) 
colliding and Xo, Yo not being burnt) for every i € J, the ith invocation of f 
during the evaluation of C/(Yo) produces a fresh collision. Now, conditioned 
on such a collision happening, the probability of actually getting a collision 
Cf (Xo) = Cf (Yo) can potentially raise significantly (by something like 2”~°) as 
this is a rare event, but then, the probability of having such a collision is also 
around 2”~*, and if this collision does not occur, we definitely will not end up 


3 We have Prr[Vg] < Prr[Xo = Yo] = 2~"*. The second step follows as Xo, Yo € 
{0,1}"* are uniformly random. The first step follows as —¢(Xo),-@(Yo) and Ey 
together imply that there are no collisions in the 2t invocations of f, and thus f is 
“consistent” with being a permutation. But in this case, C/(Xo) = C? (Yo) implies 
Xo = Yo. 
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with a transcript in V;. These two probabilities even out, and we end up with 
roughly the same probability for a transcript Vz as we had for Vg. 

Before we can prove the theorem we need one more lemma, which bounds the 
probability of I’ generating a transcript with lots (k or more) collisions. 


Lemma 3.8 


5 Pr[Vi] < 5 Pr[E7] < gk(é+2 log t—(n—1)) (9) 
L:|I|>k I:|I|>k 


Proof. The first step of Eq. (9) follows as Vr implies Ez. Let Ef denote the 
event which holds if Er holds for any J’ D I. We have 


Pr[E}] < S < o (10) 


To see this, note that to get Et, in every step i € J, x; must be fresh, and 
then f(y;) must “hit” one of the at most t distinct f(a;). As f is a random 2°-1 
function evaluated on at most 2t inputs, this probability can be upper bounded 
by (2° — 1)t/(2" — 2t) as at most (2° — 1)t of the at least 2” — 2t fresh inputs can 
“hit” as described above. The probability that we have such a “hit” for all i € I 
is the |Z| th power of this probability. The number of different J where |I| = k 
can be upper bounded by 2*!°8', using this and Eq. (10) we get 


E Pies > Pref] 


L:|I|>k I:|I|=k 


9k logt ad 
Q(n-1)k 


= gk(£+2 log t—(n—1)) 


IA 


Proof of LemmaB.5} LemmaB.5)states that Prp[Veo)] < 274r tO 8), which 


we can write as 


Eq.) 
Pr[Veo] =") Priv] + J. Priv] 
I:|I|<k I:|I|>k 


Using Lemma and and the fact that there are (,°,) < ¢* different I’s 
with |I| < k, we get 


k  9g—nk+1 k(2+2 log t—(n—1)) 
5y Pr[Vi] + 5 Pr[Vi] < t*-2 +2 
L|1|<k L:|1|>k 
ae gi tk(é+2 log t—(n—1)) 
< g—nk+k(3 log t+£) 
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Proof of Lemma For any I, we consider a new random experiment Iy. 
This experiment will define a distribution X; € {0,1}, Y/ € {0,1}”™UL. We'll 
show that 
p= (11) 
Fd 


and 
Pr[Vi] < 2- Pr[X; = Y;] (12) 


Note that the two equations above imply Lemma [3.7] The experiment Iy is 
defined as follows 


1. We sample random X6, Yg € {0,1}™” and a random permutation g over 
{0,1}”. 
2. Let x1,..., x, be the inputs to g in the evaluation of C9(X$). Let X; ef 
C9(X6). 
3. Now evaluate C9 (Y9) in steps (one invocation of g per step), where for any 
i € I do the following: 
— if y; is “fresh” (that is y; A x; for any 1 < j < t and y; # y; for any 
1 < j < i). we change the value of g(y/) and set it to some uniformly 
random value z; Ey {0,1}” (note that g is no longer a permutation). 
— If y; is no fresh set Y’ = L and stop. 


We will first prove Eq. (QI). Let’s consider a new random experiment I% which 
will define outputs Xj’, Y; € {0,1}. This experiment is defined exactly as the 
experiment I defining X;, Y/, but when y; is not fresh we nonetheless redefine 
g(y;) to a random z; (instead of setting Y’! = L and aborting). As the two 
experiments only differ when Y’ = L, but X; cannot be L, we have. 


Pali = Yi] < rly’ = Yf) 
Moreover X;’ = C9(X{)) is uniformly random (as Xý is uniform and C% is a per- 


mutation) and Y,” is independent of X;' (the reason we consider the experiment 
Tř is because in I’; we don’t have this independence), thus 


Pry = Xs] a= am 
ry 
The two equations above imply Eq. (Id). Now we show Eq. (22), i.e. 
Pr{Vil < 2- PrIX} = Y;] (13) 


We will show a stronger statement, namely that for every transcript 6 € Vr we 
have 
Pr[é] < 2- Prlé 14 
Pr[6] < 2- Prfé] (14) 


This implies (13) as 


— ô| <2. b] = 2. <Q. i — iy! 
Pr{Vi] pe 2 Pile 2: Pr[Vi] < 2: Pr[X; = Y; 
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We’ll use the following notation for the transcript ô and the transcripts generated 
by I’ and Iņ, respectively. 


a def $ oe a x 7 š 
ô = {Xo, Yo, ĉi,- ., Êt, Q1,- -3 Gt, G1,---, Gt, 01,---, de} 


v def {Xo, Yo, £1, weds , £, f (£1), oe > f (£t); 41, me Mts F1), E F(u) 


Of SEA nt eG) a 
As Xo, Yo, Xo, Yo, X6, Yo are uniformly random, we have 
Pr[(Xo, Yo) = (Xo, Yo)] = Pr[(Xo, Yo) = (Xo, ¥9)] = 2-7” 
Further 
Pr[(@1,...,24,01,..-,@¢) = (1,.--, 04, f(1),.--, f(ae)) | (Xo, Yo) = (Xo, Yo)] < 
Pr{(#1,...,%4,01,--.,a4) = (x},..., 2%, 9(2',),..-,.9(x4)) | (Xo, Yo) = (X4, YZ)] 
Using the chain rule, the above is implied by 


t 


[I Prle: = fæ). ] < JJ Pela: = oe). (15) 


i=1 a 
where here and below we use the convention that “...” always means that the 
transcript defined up to this point is consistent with the transcript ô. E.g. on 
the left side of eq. (l5) the “...” stands for 


Note that we don’t have to explicitly require Vj = 1...¢— 1 : xj = ĉj as this is 
already implied by (ro) [4 

For i = 1,...,2¢ we will denote with q; < i the number of distinct elements 
that appeared as inputs to f in the first i queries. I.e., fori < t qi = |{@1,..., ĉi} 
and for t< i < 2t, qi = Hê, TE £4,015 iiss it}. 

To see that Eq. holds, note that for any i where f; is not fresh (i.e. 
x; = x; for some j < i) we have 


Prlja; = F (az)|<-.) = Prlag—o@le, ||...) = 2 


For 7’s where x; is fresh, let q; denote the number of distinct elements in 
@,...,4;-1. A g is a random permutation and a; 4 g(x) for j < i because 
=~¢(Xo), we have 


* If the inputs (Xo, Yo) = (Xo, Yo) are identical, and all the oracle queries so far gave 
the same outputs, also all intermediate values (including the next oracle query) will 
be the same. 
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On the other hand 


1 
To see this note that Prr[f(#;) = a;|...] is exactly a if one additionally 
conditions on the fact that f(x) # aj for all j < i. Not conditioning on this 
event can only decrease the probability as a; 4 a; for j < i as =ġ(Xo). 
Now we come to the second part of the transcript. Here we will show that 


Pr[(i ++ +5 Gt, Qty +++ 06) = (Yis Yer FO), ++ FY) | ---] S 


2: Prii. -s Ge 1, --- at) = (Yi -Yp ICY) --- 9) | | 


The proof is almost identical as for the first part, except that now for fresh y; 
we have a slightly smaller probability 


that g maps to the right value b; in the experiment Iy, as by definition of I; 
the output of g is assigned a uniformly random value in this case. Using the fact 
that t < 27”/4 this difference is covered by the extra factor 2. E 


4 Extension to Collections of Lossy Functions 


By Theorem [B.I] no circuit C* (of size polynomial in n) can amplify relative 
lossiness better than sequential composition. That is, if Cf is injective for any 
permutation f : {0,1}”" — {0,1}", then there exists an (n,@)-lossy f (i.e. it has 
relative lossiness ¢/n) such that CY has relative lossiness only (€+O(log n))/n. In 
fact, a random (n, £)-lossy f will have this property with very high probability. 
In a cryptographic setting, lossy functions are not given as a single function, but 
by a collection of triple of algorithms as defined below. 


Definition 4.1 (Collection of Lossy Functions). Let A € N denote a se- 
curity parameter and n = n(A), n! = n' (A), L = L(A) be functions of A. A 
collection of (n,n',€)-lossy function is a sequence (indexed by A) of functions 
m = {g0,91,f}rxen where go, gi are probabilistic key-generation functions, such 
that 


1. Evaluation of lossy functions: For every function index a + go(1>), 
f(o,-) is a function fo : {0,1}" — {0,1} whose image is of size at most 
Jome, 

2. Evaluation of injective functions: For every function index o + g1 (1°), 
the function f(o,-) computes an injective function fe : {0,1}" — {0,1}”. 

3. Security: The ensembles {0 : o + 90(1*) J en and {a : o + gi(1*)} 
computationally indistinguishable. 


ren TE 


We refer to £ as the absolute lossiness of 7, and to £/n as the relative lossiness 


of 7. 
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Definition 4.2 (Black-Box Amplification of Lossy Collection). A triple 
of probabilistic polynomial-time oracle algorithms II* = {Gġ, Gi, F*} is a black- 
box amplification for relative lossiness from a = a(à) to 8 = B(A) B > a) if 
for every oracle 7 = {go, 91, f}xen that implements a (n,n, an)-lossy collection, 
II™ is a(m,m’, Bm)-lossy collection (where m = m(A),m! = m'(X)). 


Note that if m is efficient (i.e. can be implemented by polynomial time algo- 
rithms), so is JI". We will prove the following theorem. 


Theorem 4.3 (Impossibility of Black-Box Amplification). Let t,0,n be 
functions of X such that n(A) < €(A)+2 log(t(A))+w(A). If each of the algorithms 
in IT* = {G5,Gi,F*} makes at most t = t(A) oracle queries per invocation 
and II* amplifies relative lossiness from a(A) = ¢/n to B(A) = L/n then L = 
£+ O(logt). 


To save on notation, we will identify the security parameter A with the domain 
size n of the lossy-function we try to amplify (which will be given as an oracle). 
To prove Theorem[4.3] we will show that for any construction J*, if we choose 
a random (n, £)-lossy mn = {go, 91, f} (“random” to be defined in Section [4.1}, 
then with overwhelming probability either the outputs of Gj” and Gj” can be 
distinguished relative to mn, or for a random lossy key k «+ Gj", the function 
F*"(k,-) has very small collision probability and thus cannot be too lossy. 


4.1 The Random z = {go, 9:1, f} 


For n, € N let Ln denote the set of triples of functions go, gi : {0,1}"~! > 
{0,1}”, f : {0,1}” > {0,1}” where the range of go, gı covers all of {0,1}” (note 
this means that the range of go and gı are disjoint) and (with F,,,¢ as defined in 
the first paragraph of Section P) 


Va € {0,1}! : f(go(z),-) € Fre and f(g(x), -) € Fn.o 


Claim 4.4. For L(n) < n — w(n), let t = {tn}nen where Tn = {90, 91, f} is 
chosen uniformly in Ln, (for every n € N.) Then with overwhelming probability 
T is (n, €)-lossy even relative to an EXPTIME-complete oracle. 


4.2 (Non-)Communicating I* 


Consider a IJ* = {G5, Gj, F*} as in Definition [4.2] We will classify such M* in 
two classes, depending on whether I/* is close to being “non-communicating” or 
not. Intuitively, we say [7* is non-communicating if the lossiness of IJ” comes 
entirely from the lossiness of 7, that is, if 7 is not lossy, then also J" will not 
be lossy. 


Definition 4.5 ((close to) non-communicating). [/* is non-communicating 
if for every n E N and Tn € Ly o the function computed by F*(k,-) is injective 
for every k + G3 (1”). In addition, II* is close to being non-communicating if 
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for all but finitely many n € N, with probability 1/2 over the choice of a random 
Tn E€ Ln,0, for at least 1/2 of the keys k <~ GG", there’s a subset Mx C {0,1}™ 
of size at least 2/2 such F"™(k,ax) is injective on Mp (i.e. for xx! € Mk, 
Ft (k, x) = F7 (k, 2’) implies x = 2’). 


In order to prove that Theorem [4.3] holds for some particular construction H*, 
we will use a different argument depending on whether JI* is close to being non- 
communicating or not. The proof for the first case is almost identical to that of 
Theorem[3.1] where we rely on the fact that C* is injective for any injective key 
k. The proof for the second case relies on the indistinguishability of injective and 
lossy functions, and requires new ideas. More specifically, in this case we prove 
the following lemma: 


Lemma 4.6. If II” (as in the statement of Theorem[4.3) is far (i.e. not close) 
from being non-communicating, then for infinitely many n E€ N the following 
holds. For a random Tn € Ln the outputs of Go” and Gi” can be distinguished 
with constant advantage making poly(t,n) oracle queries to nn (and one query 
to an EXPTIME oracle). 


Due to space limitations in the remainder of this section we describe a high-level 
outline for the proof of Lemma [4.6] and refer the reader to the full version for 
the formal proof. 


Proof outline. For b € {0,1} consider a key k + Gj"(R) and let Q; denote 
all the queries that Gj” (R) made to its oracle 7, during sampling this key 
using randomness R. Now consider a (n,0)-lossy în € Fn,o which is sampled at 
random except that we require it to be consistent with the queries in Qg. As în 
is consistent with mn on Qg, we have Gi" (R) = G” (R) = k. Thus if 


— b = 1, then k is a valid injective key relative to 7, and thus F™(k,-) has 
image size 2””. 

— b = 0, then k is a valid lossy key relative to ñn. As H* is far from being 
non-communicating, with constant probability Fê» (k, -) will have an image 
size of < 2”—! despite the fact that ĉn is not lossy at all. 


Using the above two observations, here’s a way to distinguish the case b = 0 
from b = 1 (i.e. lossy from injective keys) with constant advantage given Q, 
and access to an EXPTIME oracle: query the oracle on input k, Q, and ask for 
the image size of Fê” (k, -) for a ĉn randomly sampled as described above. If the 
image size is < 2”—!, guess b = 0, guess b = 1 otherwise. 

Unfortunately we are only given the key k, but not Qg. What we’ll do is con- 
sider a random Fn which is consistent with 7, on a set of inputs/outputs Oia 
to mn which is sampled by invoking F™"(k,-) on q = poly(n,t) random inputs 
(ie. Qq contains all inputs/outputs to mn made during these q invocations). 

We will prove that for such a Fn the image size of F(k,-)"" is still close to 
2” if b = 1, but with constant probability < 2” if b = 0, so we can use our 
EXPTIME oracle to distinguish these cases by sending k, Qg" (which, unlike 
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Qk, we do have) to the EXPTIME oracle asking for the image size of F(k, -)™" 
when Fn E Fn,o is chosen at random but consistent with Ona 

The reason it is good enough to consider a 7, that is consistent with Q7%” 
and not Qx, is that for sufficiently many samples q = poly(n, t), hq Will with 
high probability contain all “heavy” queries in Qg, where we say a query is heavy 
if there’s a good probability that F7” (k,-) will make that query if invoked on a 
random input. 

So for most inputs x, F7” (k, x) will not query 7, on a query which is in Qy (i.e. 


which was made during key-generation), but is not in Ona” As a consequence, 


F™(k,-) “behaves” differently from what we would get by using Ên (which is 
consistent with all of Q;) instead of 7, only for a small fraction of the inputs. 
In particular, the image size is close to what we would have gotten by using în. 


Acknowledgements. We would like to thank Oded Goldreich and Omer Rein- 
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Abstract. If we have a problem that is mildly hard, can we create a 
problem that is significantly harder? A natural approach to hardness am- 
plification is the “direct product”; instead of asking an attacker to solve 
a single instance of a problem, we ask the attacker to solve several inde- 
pendently generated ones. Interestingly, proving that the direct product 
amplifies hardness is often highly non-trivial, and in some cases may be 
false. For example, it is known that the direct product (i.e. “parallel rep- 
etition”) of general interactive games may not amplify hardness at all. 
On the other hand, positive results show that the direct product does 
amplify hardness for many basic primitives such as one-way functions, 
weakly-verifiable puzzles, and signatures. 

Even when positive direct product theorems are shown to hold for 
some primitive, the parameters are surprisingly weaker than what we 
may have expected. For example, if we start with a weak one-way func- 
tion that no poly-time attacker can break with probability > i, then the 
direct product provably amplifies hardness to some negligible probability. 
Naturally, we would expect that we can amplify hardness exponentially, 
all the way to 2~” probability, or at least to some fixed/known negligible 
such as n~'°” in the security parameter n, just by taking sufficiently 
many instances of the weak primitive. Although it is known that such pa- 
rameters cannot be proven via black-box reductions, they may seem like 
reasonable conjectures, and, to the best of our knowledge, are widely be- 
lieved to hold. In fact, a conjecture along these lines was introduced in a 
survey of Goldreich, Nisan and Wigderson (ECCC ’95). In this work, we 
show that such conjectures are false by providing simple but surprising 
counterexamples. In particular, we construct weakly secure signatures 
and one-way functions, for which standard hardness amplification re- 
sults are known to hold, but for which hardness does not amplify beyond 
just negligible. That is, for any negligible function e(n), we instantiate 
these primitives so that the direct product can always be broken with 
probability e(n), no matter how many copies we take. 


1 Introduction 


Hardness amplification is a fundamental cryptographic problem: given a “weakly 
secure” construction of some cryptographic primitive, can we use it to build a 
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“strongly secure” construction? The first result in this domain is a classical con- 
version from weak one-way functions to strong one-way function by Yao [32] (see 
also [13]). This result starts with a function f which is assumed to be weakly 
one-way, meaning that it can be inverted on at most (say) a half of its inputs. 
It shows that the direct-product function F (x1,..., £k) = (f(a1),.-., f(we)), for 
an appropriately chosen polynomial k, is one-way in the standard sense, mean- 
ing that it can be inverted on only a negligible fraction of its inputs. The above 
result is an example of what is called the direct product theorem, which, when 
true, roughly asserts that simultaneously solving many independent repetitions 
of a mildly hard task yields a much harder “combined task” [] Since the result 
of Yao, such direct product theorems have been successfully used to argue se- 
curity amplification of many other important cryptographic primitives, such as 
collision-resistant hash functions [8], encryption schemes [12], weakly verifiable 
puzzles [7[20[22], signatures schemes/MACs [II], commitment schemes [ISIQ], 
pseudorandom functions/generators [11J26], block ciphers [24]27[25[30], and var- 
ious classes of interactive protocols [5/28]19/17]. 

Direct product theorems are surprisingly non-trivial to prove. In fact, in some 
settings, such as general interactive protocols [5[29], they are simply false and 
hardness does not amplify at all, irrespective of the number of repetitions. Even 
for primitives such as one-way functions, for which we do have “direct product 
theorems”, the parameters of these results are surprisingly weaker than what 
we may have expected. Let us say that a cryptographic construction is weakly 
secure if no poly-time attacker can break it with probability greater than 4. 
Known theorems tell us that the direct product of k = O(n) independent in- 
stances of a weakly secure construction will become secure in the standard sense, 
meaning that no poly-time attacker can succeed in breaking security with better 
than some negligible probability in the security parameter n. However, we could 
naturally expect the direct product of k instances will amplify hardness expo- 
nentially, ensuring that no poly-time attacker can break security with more than 
2-* probability. Or, we would at least expect that a sufficiently large number 
of k = poly(n) repetitions can amplify hardness to some fixed/known negligible 
probability such as e(n) = 2-”” for some constant ô > 0, or even less ambitiously, 
e(n) =n—!°8", We call such expected behavior amplification beyond negligible. 


LIMITATION OF EXISTING PROOFS. One intuitive reason that the positive re- 
sults are weaker than what we expect is the limitation of our reduction-based 
proof techniques. In particular, assume we wanted to show that the k-wise 
direct product amplifies hardness down to some very small probability e. Then 
we would need an efficient reduction that uses an adversary A breaking the se- 
curity of the k-wise direct product with probability £, to break the security of a 
single instance with a much larger probability, say one half. Unfortunately, the 


1 A related approach to amplifying the hardness of decisional problems is the “XOR 
Lemma” which roughly asserts the hardness of predicting an XOR of the challenge 
bits of many independent instances of a decisional problem will amplify. In this 
work, we will focus of “search” problems such as one-way functions and signatures 
and therefore only consider amplification via direct product. 
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reduction cannot get “anything useful” from the attacker A until it succeeds at 
least once. And since A only succeeds with small probability £, the reduction is 
forced to run A at least (and usually much more than) 1/e times, since otherwise 
A might never succeed. In other words, the reduction is only efficient as long 
as € is an inverse polynomial. This may already be enough to show that the 
direct product amplifies hardness to some negligible probability, since the success 
probability of A must be smaller than every inverse polynomial £. But it also 
tells us that black-box reductions cannot prove any stronger bounds beyond neg- 
ligible, since the reduction would necessarily become inefficient 2] For example, 
we cannot even prove that the k-wise direct product of a weak one-way function 
will amplify hardness to n~ !°8” security (where n is the security parameter), no 
matter how many repetitions k we take. 


OUR QUESTION. The main goal of this work is to examine whether the limi- 
tations of current hardness amplification results are just an artifact our proof 
technique, or whether they reflect reality. Indeed, we may be tempted to ignore 
the lack of formal proofs and nevertheless make the seemingly believable conjec- 
ture that hardness does amplify beyond negligible. In more detail, we may make 
the following conjecture: 


Conjecture (Informal): For all primitives for which standard direct product 
theorems hold (e.g., one-way functions, signatures etc.), the k-wise direct product 
of any weakly secure instantiation will amplify hardness all the way down to 
some fired negligible bound e(n), such as e(n) = 2~2("), or, less ambitiously, 
e(n) =n—!°8", when k = poly(n) is sufficiently large. 


To the best of our knowledge, such a conjecture is widely believed to hold. 
The survey of Goldreich et al. [14] explicitly introduced a variant of the above 
conjecture in the (slightly different) context of the XOR Lemma and termed it 
a “dream version” of hardness amplification which, although seemingly highly 
reasonable, happens to elude a formal proof. 


OUR RESULTS. In this work, we show that, surprisingly, the above conjecture 
does not hold, and give strong counterexamples to the conjectured hardness 
amplification beyond negligible. We do so in the case of signature schemes and 
one-way functions for which we have standard direct-product theorems showing 
that hardness amplifies to negligible [3211]. Our result for the signature case, 
explained in Section [B] relies on techniques from the area of stateless (resettably- 
secure) multiparty computation [6[3[10]16]15]. On a high level, we manage to 
embed an execution of a stateless mutliparty protocol IT into the design of our 
signature scheme, where I generates a random instance of a hard relation R, 
and the signer will output its secret key if the message contains a witness for R. 
The execution of I can be driven via carefully designed signing queries. Since 
IT is secure and F is hard, the resulting signature scheme is still secure by itself. 


? This “folklore” observation has been attributed to Steven Rudich in [14]. 
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However, our embedding is done in a way so as to allow us to attack the direct 
product of many independent schemes by forcing them to execute a single cor- 
related execution of I resulting in a common instance of the hard relation R. 
This allows us to break all of the schemes simultaneously by breaking a single 
instance of R, and thus with some negligible probability e(n), which is indepen- 
dent of the number of copies k. Indeed, we can make e(n) an arbitrarily large 
negligible quantity (say, n7 !°8”) by choosing the parameters for the relation R 
appropriately. 

One may wonder whether such counterexamples are particular to signature 
schemes. More specifically, our above counterexample seems to crucially rely 
on the fact that the security game for signatures is highly interactive (allow- 
ing us to embed an interactive MPC computation) and that the communication 
complexity between the challenger and attacker in the security game can be ar- 
bitrarily high (allowing us to embed data from all of the independent copies of 
the scheme into the attack on each individual one). Perhaps hardness still am- 
plifies beyond negligible for simpler problems, such as one-way functions, where 
the security game is not interactive and has an a-priori bounded communication 
complexity. Our second result gives strong evidence that this too is unlikely, by 
giving a counterexmaple for one-way functions. The counterexample relies on a 
new assumption on a hash functions called Extended Second Preimage Resistance 
(ESPR), which we introduce in this paper. Essentially, this assumption says that 
given a random challenge x, it is hard to find a bounded-length Merkle path that 
starts at x, along with a collision on it. To break many independent copies of this 
problem, the attacker takes the independent challenges x,,...,2, and builds a 
Merkle tree with them as leaves. If it manages to find a single collision at the 
root of tree (which occurs with some probability independent of k), it will be 
able to find a witness (a Merkle path starting at x; with a collision) for each of 
the challenges x;. So far, this gives us an amplification counterexample for a hard 
relation based on the ESPR problem (which is already interesting), but, with 
a little more work, we can also convert it into a counterexample for a one-way 
function based on this problem. For the counterexample to go through, we need 
the ESPR assumption to hold for some fixed hash function (not a family), and 
so we cannot rely on collision resistance. Nevertheless, we argue that the ESPR 
assumption for a fixed hash function is quite reasonable and is likely satisfied 
by existing (fixed) cryptographic hash functions, by showing that it holds in a 
variant of the random oracle model introduced by Unruh [BI], where an attacker 
gets arbitrary “oracle-dependent auxiliary input”. As argued by [31], such model 
is useful for determining which security properties can be satisfies by a single 
hash function rather than a family. 

Overall, our work gives strong indications that the limitations of our reduc- 
tionist proofs for the direct product theorems might actually translate to real 
attacks for some schemes. 


RELATED WoRK. Interestingly, a large area of related work comes from a 
seemingly different question of leakage amplification [2{1[23[21|. These works 


480 Y. Dodis et al. 


ask the following: given a primitive P which is resilient to @ bits of leakage on 
its secret key, is it true that breaking k independent copies of P is resilient 
to almost L = £k bits of leakage? At first sight this seems to be a completely 
unrelated question. However, there is a nice connection between hardness and 
leakage-resilience: if a primitive (such as a signature or one-way function) is 
hard to break with probability £, then it is resilient to log(1/e) bits of leakage. 
This means that if some counter-example shows that the leakage bound L does 
not amplify with k, then neither does the security. Therefore, although this 
observation was never made, the counterexamples to leakage amplification from 
[2321] seem to already imply some counterexample for hardness. Unfortunately, 
both works concentrate on a modified version of parallel repetition, where some 
common public parameters are reused by all of the instances and, thus, they are 
not truly independent. Indeed, although showing counterexamples for (the harder 
question of) leakage amplification is still interesting in this scenario, constructing 
ones for hardness amplification becomes trivial Ë] However, the work of [PI] also 
proposed that a variant of their counterexample for leakage amplification may 
extend to the setting without common parameters under a highly non-standard 
assumption about computationally sound (CS) proofs. Indeed, this suggestion 
led us to re-examine our initial belief that such counterexamples should not exist, 
and eventually resulted in this work. We also notice that our counterexample for 
signature schemes (but not one-way functions) can be easily extended to give a 
counterexample for leakage amplification without common parameters. 


2 Hardness Amplification Definitions and Conjectures 


In this work, we will consider a non-uniform model of computation. We equate 
entities such as challengers and attackers with circuit families, or equivalently, 
Turing Machines with advice. We let n denote the security parameter. We say 
that a function e(n) is negligible if e(n) = ne ®. 

We begin by defining a general notion of (single prover) cryptographic games, 
which captures the security of the vast majority of cryptographic primitives, 
such as one-way functions, signatures, etc. 


Definition 1 (Games). A game is defined by a probabilistic interactive chal- 
lenger C. On security parameter n, the challenger C(1”) interacts with some 
attacker A(1") and may output a special symbol win. If this occurs, we say that 


A(1”) wins C(1"). 


We can also define a class C of cryptographic games C € C. For example the 
factoring problem fixes a particular game with the challenger Cr4cror that 
chooses two random n-bit primes p,q, sends N = p-q to A, and outputs win 
iff it gets back p,q. On the other hand, one-way functions can be thought of 
as a class of games Cowr, where each candidate one-way function f defines 


3 E.g., the hard problem could ask to break either the actual instance or the common 
parameter. While such an example does not necessarily contradict leakage amplifi- 
cation, it clearly violates hardness amplification. 
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a particular game Cf E€ Cowr. So far, this definition of games and classes of 
games such as one-way function is purely syntactic and we now define what it 
means for a game to be hard. 


Definition 2 (Hardness). We say that the game C is (s(n), e(n))-hard if, for 
all sufficiently large n € N and all A(1”) of size s(n), we have 


Pr[A(1”) wins C(1”)] < e(n). 


We say that the game C is (poly, e(n))-hard if it is (s(n), e(n))-hard for all polyno- 
mial s(n). We say that the game C is (poly, negl)-hard if it is (s(n), 1/p(n))-hard 
for all polynomials s(n), p(n). 


Definition 3 (Direct Product). For a cryptographic game C we define the 
k-wise direct-product game C¥, which initializes k independent copies of C and 
outputs the win symbol if and only if all k copies individually output win. 


Finally, we are ready to formally define what we mean by hardness amplification. 
Since we focus on negative results, we will distinguish between several broad 
levels of hardness amplification and ignore exact parameters. For example, we 
do not pay attention to the number of repetitions k needed to reach a certain 
level of hardness (an important parameter for positive results), but are more 
concerned with which levels of hardness are or are not reachable altogether. 


Definition 4 (Hardness Amplification). For a fixed game C, we say that 
hardness amplifies to € = e(n) if there exists some polynomial k = k(n) such 
that C! is (poly,e)-hard. We say that hardness amplifies to negligible if there 
exists some polynomial k = k(n) such that C! is (poly, negl)-hard. For a class C 
of games, we say that: 


1. The hardness of a class C amplifies to negligible if, for every game C € C 
which is (poly, 4)-hard, the hardness of C amplifies to negligible. 

2. The hardness of a class C amplifies to e(n) if, for every game C € C which 
is (poly, 4)-hard, the hardness of C amplifies to e(n). 

3. The hardness of a class C amplifies beyond negligible if there exists some 
global negligible function e(n) for the entire class, such that the hardness of 
C amplifies to e(n). 


Remarks on Definition. The standard “direct product theorems” for classes 
such as one-way functions/relations and signatures show that the hardness of 
the corresponding class amplifies to negligible (bullet 1). For example, if we take 
any (poly, 1/2)-hard function f, then a sufficiently large direct product f* will be 
(poly, negl)-hard/ However, what “negligible” security can we actually get? The 
result does not say and it may depend on the function f that we start with Ë| One 


4 The choice of 1/2 is arbitrary and can be replaced with any constant or even any 
function bounded-away-from 1. We stick with 1/2 for concreteness and simplicity. 

5 It also seemingly depends on the exact polynomial size s(n) of the attackers we are 
trying to protect against. However, using a result of Bellare [4], the dependence on 
s(n) can always be removed. 
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could conjecture that there is some fixed negligible e(n) such that a sufficiently 
large direct product of any weak instantiation will amplify its hardness to e(n). 
This is amplification beyond negligible (bullet 3). More ambitiously, we could 
expect that this negligible e(n) is very small such as e(n) = 277?™® or even 
2-2(") We explicitly state these conjectures below. 


Dream Conjecture (Weaker): For any class of cryptographic games C for 
which hardness amplifies to negligible, it also amplifies beyond negligible. 


Dream Conjecture (Stronger): For any class of cryptographic games C for 
which hardness amplifies to negligible, it also amplifies to e(n) = ae 

Our work gives counterexamples to both conjectures. We give two very dif- 
ferent types counterexamples: one for the classes of signature schemes (Section 
and one for the class of one-way functions (Section A). Our counterexam- 
ples naturally require that some hard instantiations of these primitives exist to 
begin with, and our counterexamples for the weaker versions of the dream con- 
jecture will actually require the existence of exponentially hard versions of these 
primitives. In particular, under strong enough assumptions, we will show that 
for every negligible function e(n) there is stand-alone scheme which is already 
(poly, negl)-hard, but whose k-wise direct product is not (poly,¢(n))-hard, no 
matter how large k is. 


2.1 Hard and One-Way Relations 


As acomponent of both counterexamples, we will rely on the following definition 
of hard relations phrased in the framework of cryptographic games: 


Definition 5 (Hard Relations). Let RC Unen {0, 1}” x {0, 14° be an NP 
relation consisting of pairs (y, w) with instances y and wintesses w of polynomial 
size p(|y|). Le LD={y : dw s.t. (y,w) E€ R} be the corresponding NP language. 
Let y + SAML(1") be a PPT algorithm that samples values y € L. For a relation 
R = (R, SAML), we define the corresponding security game where the challenger 
C(1") samples y + SAML(1”) and the adversary wins if it outputs w s.t. (y, w) € 
R. By default, we consider (poly, negl)-hard relations, but we can also talk about 
(s(n), e(n))-hard relations. 


Note that, for hard relations, we only require that there is an efficient algorithm 
for sampling hard instances y. Often in cryptography we care about a sub-class 
of hard relations, which we call one-way relations, where is it also feasible to 
efficiently sample a hard instance y along with a witness w. We define this below. 


Definition 6 (One-Way Relation). Let R be an NP relation and L be the 
corresponding language. Let (y,w) + SAMR(1") be a PPT algorithm that sam- 
ples values (y,w) € R, and define y + SAML(1") to be a restriction of SAMR 
to its first output. We say that (R,SAMR) is a one-way relation if (R, SAML) is 
a hard relation. 
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3 Counterexample for Signature Schemes 


3.1 Overview 


The work of [II] shows that the direct product of any stateless signature scheme 
amplifies hardness to negligible. We now show that it does not (in general) am- 
plify hardness beyond negligible. In fact, we will give a transformation from any 
standard signature scheme (Gen, Sign, Verify) into a new signature scheme (GEN, 
SIGN, VERIFY) whose hardness does not amplify (via a direct product) beyond 
negligible. We start by giving an informal description of the transformation to 
illustrate our main ideas. In order to convey the intuition clearly, we will first 
consider a simplified case where the signing algorithm SIGN of the (modified) 
scheme (GEN, SIGN, VERIFY) is stateful, and will then discuss how to convert 
the stateful signing algorithm into one that is stateless 


Embedding MPC in Signatures. Let (Gen, Sign, Verify) be any standard signature 
scheme. Let F = {Fk}ken be a randomized k-party “ideal functionality” that 
takes no inputs and generates a random instance y of a hard relation R = 
(R, SAML) according to the distribution SAML. Further, let I = {Ik}nen be a 
multi-party computation protocol that securely realizes the functionality F for 
any number of parties k. Then, the new signature scheme (GEN, SIGN, VERIFY) 
works as follows. 

Algorithms GEN and VERIFY are identical to Gen and Verify respectively. The 
signing algorithm SIGN is essentially the same as Sign, except that, on receiving a 
signing queries of a “special form”, SIGN interprets these as “protocol messages” 
for 7, and (in addition to generating a signature of them under SIGN) also 
executes the neszt message function of the protocol and outputs its response as 
part of the new signature. A special initialization query specifies the number of 
parties k involved in the protocol and the role P; in which the signing algorithm 
should act. The signing algorithm then always acts as the honest party P; while 
the user submitting signing queries can essentially play the role of the remaining 
k — 1 parties. When IZ; is completed yielding some output y (interpreted as 
the instance of a hard relation R) the signing algorithm SIGN will look for a 
signing query that contains a corresponding witness w, and, if it receives one, 
will respond to it by simply outputting its entire secret key in the signature. The 
security of the transformed signature (GEN, SIGN, VERIFY) immediately follows 
from the security of the MPC protocol IJ; against all-but-one corruptions, the 
hardness of the relation R and the security of the original signature scheme. 


Attacking the Direct-Product. Let us briefly demonstrate an adversary A for 
the k-wise direct product. Very roughly, A carefully chooses his signing queries 


6 We note that in the setting of stateful signatures, hardness fails to amplify even 
to negligible since we can embed the counterexamples of into the signature 
scheme. Nevertheless our initial description of our counterexample for the stateful 
setting will clarify the main new result, which is a counterexample for the stateless 
setting. 
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so as to force SIGN),..., SIGN, to engage in a single execution of the protocol 
Ik, where each SIGN; plays the role of a different party P;, while A simply 
acts as the “communication link” between them. This results in all component 
schemes SIGN; generating a common instance y of the hard relation. Finally, A 
simply “guesses” a witness w for y at random and, if it succeeds, submits w 
as a signing query, thereby learns the secret key of each component signature 
scheme thereby breaking all k of them! Note that the probability of guessing w 
is bounded by some negligible function in n and is independent of the number 
of parallel repetitions k. 


Stateful to Stateless. While the above gives us a counterexample for the case 
where SIGN is a stateful algorithm, (as stated above) we are mainly interested 
in the (standard) case where SIGN is stateless. In order to make SIGN a stateless 
algorithm, we can consider a natural approach where we use a modified ver- 
sion I, of protocol [,: each party P; in Ij, computes an outgoing message in 
essentially the same manner as in Jk, except that it also attaches an authenti- 
cated encryption of its current protocol state, as well as the previous protocol 
message. This allows each (stateless) party P; to “recover” its state from the 
previous round to compute its protocol message in the next round. Unfortu- 
nately, this approach is insufficient, and in fact insecure, since an adversarial 
user can reset the (stateless) signing algorithm at any point and achieve the 
effect of rewinding the honest party (played by the signing algorithm) during 
the protocol Ik. To overcome this problem, we leverage techniques from the 
notion of resettably-secure computation. Specifically, instead of using a stan- 
dard MPC protocol in the above construction, we use a recent result of Goyal 
and Maji [I5] which constructs an MPC protocol that is secure against reset at- 
tacks and works for stateless parties for a large class of functionalities, including 
“inputless” randomized functionalities (that we will use in this paper). 

The above intuitive description hides many details of how the user can actually 
“drive” the MPC execution between the k signers within the direct-product game 
where all signers respond to a single common message. We proceed to make this 
formal in the following section. 


3.2 Our Signature Scheme 


We now give our transformation from any standard signature scheme into one 
whose hardness does not amplify beyond negligible. We first establish some no- 
tation. 


Notation. Let n be the security parameter. Let (Gen, Sign, Verify) be any stan- 
dard signature scheme. Further, let (R, SAML) be a hard relation as per Defini- 
tion] Let {PRFx : {0,1}°°%™ — {0,1}? } er9.13»} be a pseudo-random 
function family. 


Stateless MPC. We consider a randomized k-party functionality F = {F}ren 
that does not take any inputs; F simply samples a random pair y < SAML(1") 
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and outputs y to all parties. Let {Ik }kepoly(n) be a family of protocols, where 
each IT, = {P,,...,P.} is a k-party MPC protocol for computing the function- 
ality F in the public state model. This model is described formally in the full 
version, and we only give a quick overview here. Each party P; is completely de- 
scribed by the next message function NM;, which takes the following four values 
as input: (a) a string 7;-1 that consists of all the messages sent in any round 
j —1 of the protocol, (b) the public state state; of party P;, and (c) the secret 
randomness r;. On receiving an input of the form 7;_||state;||r;, NM; outputs 
P,’s message in round j along with the updated value of state;. We assume that 
an attacker corrupts (exactly) k — 1 of the parties. In the real-world execution, 
the attacker can arbitrarily call the next-message function NM; of the honest 
party P; with arbitrarily chosen values of the public state state; and arbitrary 
message 7;—1 (but with an honestly chosen and secret randomness r;). Never- 
theless, the final output of P; and the view of the attacker can be simulated in 
the ideal world where the simulator can “reset” the ideal functionality. In our 
case, that means that the attacker can adaptively choose one of polynomially 
many honestly chosen instances y1,..., Yq of the hard relation which P; will then 
accept as output. 


The Construction. We describe our signature scheme (GEN, SIGN, VERIFY). 
GEN(1"): Compute (pk, sk) + Gen(1"). Also, sample a random tape K + 
{0,1}P°"%™) and a random identity id € {0,1}". Output PK = (pk,id) and 
SK = (sk, K, id). 


SıGN(S K, m): To sign a message m using secret key SK = (sk, K, id), the signer 


outputs a signature o = (o',o7) where o! + Sign(sk,m). Next, if m does 
not contain the prefix “prot”, then simply set o? = {0}. Otherwise, parse 
m = (“prot”||IM||z,||statel|w), where IM = kllidi||...|lid, such that state = 
state; ||... ||state,, then do the following: 


— Let i € [k] be such that id = id;. Compute r; = PRF (IM). Then, apply the 
next message function NM; of (stateless) party P;_in protocol I, over the 
string 7;||state;||r; and set a? to the output vael 

— Now, if ø? contains the output y of protocol IT, AE then further check whether 
(y, w) € R. If the check succeeds, set o? = SK. 

VERIFY(PK,m, cø): Given a signature ¢ = (at, g?) on message m with respect 
to the public key PK = (pk, id), output 1 iff VERIFY (pk, m,o!) = 1. 


T Note that here ø? consists of party P;’s protocol message in round j + 1, and its 
updated public state state;. 

8 Note that this is the case when j is the final round in Mp. Here we use the prop- 
erty that the last round of Hp is the output delivery round, and that when NM; is 
computed over the protocol messages of this round, it outputs the protocol output. 
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This completes the description of our signature scheme. In the full version, we 
prove the following theorem showing that the signature scheme satisfies basic 
signature security. 


Theorem 1. If (Gen, Sign, Verify) is a secure signature scheme, {PRF x} is a 
PRF family, R is a hard relation, and IT, is a stateless MPC protocol for func- 
tionality F, then the proposed scheme (GEN, SIGN, VERIFY) is a secure signature 
scheme. 


3.3 Attack on the Direct Product 


Theorem 2. Let (GEN, SIGN, VERIFY) be the described signature scheme and 
let R = (SAML, R) be the hard relation used in the construction. Assume that for 


any y < SAML(1") , the size of the corresponding witness w is bounded by |w| = 
p(n). Then, for any polynomial k = k(n), there is an attack against the k-wise 
direct product running in time poly(n) with success probability e(n) = 2-P\™). 


We will prove Theorem [2] by constructing an adversary A that mounts a key- 
recovery attack on any k-wise direct product of the signature scheme (GEN, 
SIGN, VERIFY). 


k-wise Direct Product. Let (Gen, Sign, Verify) denote the k-wise direct 
product of the signature scheme (GEN, SIGN, VERIFY), described as follows. 
The algorithm Gen runs GEN k-times to generate (Pky, SK1),...,(PK,, S Kx). 
To sign a message m, Sign computes o; < SIGN(SK;, m) for every i € k and 
outputs o = (041,...,0%). Finally, on input a signature o = (01,...,0%) on 
message m, Verify outputs 1 iff Vi € k, VERIFY(PK;,m,o;) = 1. 


Description of A. We now describe the adversary A for (Gen, Sign, Verify). 
Let (PKj,...,PK,) denote the public key that A receives from the challenger 
of the signature scheme (Gen, Sign, Verify), where each PK; = (pkj,id;). The 
adversary A first sends a signing query mo of the form “prot” ||IM||7o||state]|w, 
where IM = &illidy||... |lid,, and To = state = w = {0}. Let o = (a,..., 0%) be 
the response it receives, where each c; = o},07. A now parses each o? as a first 
round protocol message 7{ from party P; followed by the public state state; of 
P; (at the end of the first round) in protocol Ix. 

A now prepares a new signing query mı of the form “prot” ||IM||7j||state]|w, 
where IM and w are the same as before, but mı = 7}||...||7#, and state = 
state; ||... ||state,. On receiving the response, A repeats the same process as 
above to produce signing queries m2,...,™mz—-1, where t is the total number of 
rounds in protocol Jp. (That is, each signing query m2, ...,mM—1 is prepared in 
the same manner as mı.) 

Finally, let o = (01,..., 0%) be the response to the signing query m:_1. A now 
parses each a? as the round t protocol message 7! from party P; followed by the 
state state; of P;. Now, since the final round (i.e., round t) of protocol I, is the 
output delivery round, and further, Hp satisfies the publicly computable output 


property, A simply computes the protocol output y from the messages 7},..., 7. 
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Now, A guesses a p(n)-sized witness w* & {0,1}? at random and, if (y, w*) € 
R(x), it now sends the final signing query m, =“prot” ||IM||7;||statel|w, where 
IM is the same as before, 7 = 7}||...||7*, state = state, ||... ||state,, and w = 
w*. Thus, A obtains Sky,...,S Ky from the challenger and can forge arbitrary 
signatures for the direct product scheme. It’s clear that its success probability 
is at least 2-?(), 


Corollary 1. Assuming the existence hard relations and a general stateless MPC 
compilers, the hardness of signature schemes does not amplify to any e(n) = 
g- | This gives a counterexample to the strong dream conjecture. If we, in 
addition, assume the existence of (22 ,2-°™)-hard relations with witness size 
p(n) = O(n), then there exist signature schemes whose hardness does not amplify 
beyond negligible. This gives a counterexample to the weak dream conjecture. 


Proof. For the first result, assume that the witness size of the relation R is 
bounded by p(n) = O(n‘) for some constant c. Given any constant ô > 0, we 
can simply instantiate the signature scheme (GEN, SIGN, VERIFY) used in our 
counterexample with the hard relation R’ that uses security parameter m(n) = 
n°/© so that its witness size is p'(n) = p(m) = O(n®). It’s clear that R’ is still 
(poly(n), negl(n))-secure but, by Theorem 2] the k-wise direct product can be 
broken in poly(n) time with probability e(n) = 2-9") Therefore security does 
not amplify 20° for any ô > 0. The second part of the theorem follows in the 
same way, except that, for any fixed negligible function (n) we set m(n) = 


— log(6(n)). 


4 Counterexample for One-Way Relations and Functions 


In Section B] we proved that there exist signature schemes whose hardness does 
not amplify. This already rules out the general conjecture that “for any game for 
which hardness amplifies to negligible, hardness will also amplify to exponential 
(or at least beyond negligible)”. Nevertheless, one might still think that the 
conjecture hold for more restricted classes of games. Perhaps the simplest such 
class to consider is one-way functions. Note that, unlike the case for signature 
schemes, the one-wayness game does not allow interaction and has bounded 
communication between attacker and challenger. Thus, the general strategy we 
employed in Section B]of embedding a multiparty computation inside signature 
queries, will no longer work. In this section, we propose an alternate strategy for 
showing that one-way relation hardness does not amplify beyond negligible. 


4.1 Our Construction 


We begin by giving a counterexample for hard relations. We then extend it to 
counterexamples for one-way relations and and one-way functions. Our construc- 
tions are based on a new (non-standard) cryptographic security assumption on 
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hash functions. Let h : {0,1}°” + {0,1}” be a hash function. We define a Merkle 
path of length £ to be a tuple of the form 


Pe= (xo, (b1, £1), --. 5 (be, £e)) : bi € {0, 1}, x; € {0, 1%”. 


Intuitively, £o could be the leaf of some Merkle tree of height £, and the values 
X1,.--,%¢ are the siblings along the path from the leaf to the root, where the 
bits b; indicate whether the sibling x; is a left or right sibling. However, we can 
also talk about a path pe on its own, without thinking of it as part of a larger 
tree. Formally, if pg is a Merkle path as above, let pe_; be the path with the last 
component (bg, xe) removed. The value of a Merkle path py as above is defined 
iteratively via: 7 

E h(h(pe-1), xe) €>0,be=1 

(pe) = 4 h(xe, h(pe-1)) £ > 0,be = 0 

XO L=0 


We call xo the leaf of the path pe, and z = h(pe) is its root. We say that 
y = (£L,£r) € {0,1}°" is the known preimage of the path pe if £L,£&p are 
the values under the root, so that either xp = £e, tp = h(pe_1) if be = 0, 
or zz = h(pe-1),tr = xe if be = 1. Note that this implies h(y) = h(pe). We 
say that y’ € {0,1}°" is a second preimage of the path pe if y! # y is not 
the known preimage of pe, and h(y’) = h(pe). We are now ready to define the 
extended second-preimage resistance (ESPR) assumption. This assumption says 
that, given a random challenge x9 € {0,1}"”, it is hard to find a (short) path p 
containing zo as a leaf, and a second-preimage y’ of pe. 


Definition 7 (ESPR). Leth : {{0,1}?" > {0,1}"}nen be a poly-time com- 
putable hash function. We define the Extended Second Preimage Resistance 
(ESPR) assumption on h via the following security game between a challenger 
and an adversary A(1”): 


1. The challenger chooses xo & {0,1}”" at random and gives it to A. 
2. A wins if it outputs a tuple (pe, y'), where pe is a Merkle path of length l < n 
containing xo as a leaf, and y' is a second-preimage of pe. 


Discussion. In the above definition, we want h to be a single fixed hash function 
and not a function family. The notion of ESPR security seems to lie somewhere 
in between second-preimage resistance (SPR) and collision resistance (CR), im- 
plying the former and being implied by the latter [| Unfortunately, collision resis- 
tance cannot be achieved by any fixed hash function (at least w.r.t non-uniform 
attackers), since the attacker can always know a single hard-coded collision as 
auxiliary input. Fortunately, there does not appear to be any such trivial non- 
uniform attack against ESPR security, since the attacker is forced to “incor- 
porate” a random leaf x into the Merkle path on which it finds a collision. 
Therefore, in this regard, it seems that ESPR security may be closer to SPR 


° A hash function is SPR if, given a uniformly random y, it’s hard to find any y’ 4 y 
such that h(y) = h(y’). It is CR if it is hard to find any y £ y’ s.t. h(y) = h(y’). 
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security, which can be achieved by a fixed hash function (if one-way functions 
exist). Indeed, in Section [4.2] we give a heuristic argument that modern (fixed) 
cryptographic hash functions already satisfy the ESPR property, even against 
non-uniform attackers. We do so by analyzing ESPR security in a variant of the 
random-oracle model, where the attacker may observe some “oracle-dependent 
auxiliary input”. This model, proposed by Unruh [3]], is intended to capture the 
properties of hash functions that can be achieved by fixed hash functions, rather 
than function families. 


A Hard Relation from ESPR. Given a hash function h we can define the NP 
relation Rp with statements x € {0,1}" and witnesses w = (pe, y’) where py is a 
Merkle path of length £ < n containing x as leaf, and y’ is a second-preimage of 


pe. The corresponding NP language is defined as Ly, E {x : Jws.t. (x,w) € 
Rn }. We say that h is slightly regular, if for every z € {0,1}" there exist at 
least two distinct pre-images y Æ y’ such that h(y) = h(y’) = z. If this is the 
case, then La = {0,1}" is just the language consisting of all bit strings. Now, 
we can define the distribution + SAML(1”) which just samples x & {0,1}” 
uniformly at random. It is easy to see that, if h is an (s(n), e(n))-hard ESPR 
hash function, then Ra = (Rn, SAML) is an (s(n), €(n))-hard relation. 


Hardness Non-Amplification. We now show our counterexample to the hardness 
amplification for the hard relation Rp. The main idea is that, given k random 
and independent challenges «,...,2), the attacker builds a Merkle tree with 
the challenges as leaves. Let z be the value at the top of the Merkle tree. Then 
the attack just guesses some value y’ € {0,1}°” at random and, with probability 
> 2-2" y’ will be a second-preimage of z (i.e. h(y’) = z and y’ is distinct from 
the known preimage y containing the values under the root). Now, for each leaf 
a, let pi be the Merkle path for the leaf 2. Then the witness w; = (y', p$) is 
good witness for «. So, with probability > 27?” with which the attack correctly 
guessed y’, it breaks all k independent instances of the relation Rp, no matter 
how large k is! By changing the relation Ra = (Rn, SAML) so that, on security 
parameter n, the sampling algorithm SAML(1") chooses z È {0,1} with m = 
m(n) being some smaller function of n such as m(n) = n? for a constant ô > 0 or 
even m(n) = log?(n), we can get more dramatic counterexamples where hardness 


logn 


does not amplify beyond e(n) = 277° or even e(n) =n . We now summarize 


the above discussion with a formal theorem. 


Theorem 3. Let h be a slightly regular, ESPR-secure hash function and let 
Rr = (Rh, SAML) be the corresponding (poly, negl)-hard relation. Then, for any 
polynomial k = poly(n), the k-wise direct product of Rp is not (poly, 27°”) 
secure. That is, for any polynomial k, there is a poly-time attack against the 
k-wise direct product of Rp, having success probability 272”. 


Proof. We first describe the attack. The attacker gets k independently gener- 
ated challenges ¢),..., 2"). Let £ be the unique value such that 2°-! < k < 2°, 
and let k* = 2° be the smallest power-of-2 which is larger than k. Let us define 
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additional “dummy values” «*+)) =... = x") := 0". The attack constructs a 
Merkle Tree, which is a full binary tree of height 4, whose k* leaves are associ- 
ated with the values ¢,...,a"). The value of any non-leaf node v is defined 
recursively as val(v) = h(val(vz), val(vr)) where vz, vp are the left and right 
children of v respectively. For any leaf v™ associated with the value 2, let 


(vy =v, v2, ..., vg, r) be the nodes on the path from the leaf vı to the root r in 
the Merkle tree. The Merkle path associated with the value az is then defined 
by pS? = (x, (21,b1),..., (we, be)) where each z; is the value associated with 


the sibling of vj, and b; = 0 if v; is a right child and 1 otherwise. Note that, 
if r is the root of the tree and z = val(r) is the value associated with it, then 
h(p\) = z for all paths po with i € {1,...,k*}. Furthermore let us label the 
nodes vz, vr to be the children of the root r, the values x,,x2R be the values 
associated with them, and set y := (£L, £r). Then y is the known preimage such 
that h(y) = z, associated with each one of the paths pê. 

The attack guesses a value y/ © {0,1}°" at random and, outputs the k-tuple 
of witnesses (w1,..., wp) where w; = (p®, y’). With probability at least 2-2", y' 
is a second-preimage of z with h(y’) = z and y’ Æ y (since h is slightly regular, 
such second preimage always exists). If this is the case, then y’ is also a second 
preimage of every path ps. Therefore, with probability > 27?” the attack finds 
a witness for each of the k instances and wins the hard relation game for the 
direct product relation RE. 


Corollary 2. Assuming the existence of a slightly regular (poly, negl)-hard ESPR 
hash functions, the hardness of hard relations does not amplify to oe giv- 
ing a counterexample to the stronger dream conjecture. If we instead assume 
the existence of (22™,2-2())-hard ESPR hash functions, then the hardness of 
hard relations does not amplify beyond negligible, giving a counterexample to the 
weaker dream conjecture. 


Proof. Let h be the ESPR hash-function. We define a modified relation Rẹ = 
(Rn, SAML) where the sampling algorithm SAML(1”) samples an instance x € 
{0,1} where m = m(n) is some function of n. For the first part of the corollary, 
let 6 > 0 be any constant, and set m(n) = n?/2. Then RP is still a (poly, negl)- 
hard relation. However, by appplying Theorem [3] with m replacing n, we see 
that for any k = poly(m) = poly(n), there is an attack against the k-wise di- 
rect product which succeeds with probability > 27?” = 2-”" In other word, 
for any 6 > 0, there is a (poly, negl)-hard relation whose direct product is not 
(poly, 2-”” )-hard, no matter how large k is. This proces the first part of the 
corollary. The second part of the corollary works the same way as the first part 
but, for any fired negligible function 6(n) we set m(n) = —4 log(d(n)). Assuming 
that h is a (2°, 2-2())-hard ESPR. hash function, the relation R?” is then 
still (poly, negl)-hard, but it’s direct product is not (poly, 6(n))-hard. This proves 
the second part of the corollary. 
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Extension to One-Way Relations. We can get essentially the same results as 
above for one-way relations rather than just hard relations. Assume that Row = 
(Row, SAMRow) is any one-way relation, and Ra = (Rn, SAML g) is the hard re- 
lation used in our counterexample. Define the OR relation Ro, = (Ror, SAMRo,) 
via: 
Ror © {(y1, y2), (wi, we) : (y1, w1) E€ Rar or (y2, w2) € Row} 
SAMRor(1”) : Sample yı + SAML} (1”), (y2, w2) — SAMRow(1”) 


Output: ((y1, Y2), (0, w2)). 


Then Theorem [B] applies as-is to the one-way relation Ror replacing Ry and 
Corollary Pļ]applies to one-way relations as well. 


Extension to One-Way Functions. We can also extend the above counterexample 
to one-way functions. Let i(n) > n be a polynomial and f : {{0, yi > 
{0,1}" }nen be a regular one-way function so that, for x È £0, 1, the 
output f(x) is uniformly random over {0,1}". Let R = (R, SAML) be the hard 
relation for which we have a counterexample, with witness-size bounded by u(n). 
We define F : ({0,1}%” x {0,1}" x {0,1}" x {0,1}") > {0,1}” via: 


at fy  If(y,w)e RAz=0" 
F(x,y, w,z) = y Otherwise. 


Note that the distribution of F(x,y, w,z) is statistically close to that of f(x) 
since the probability of z = 0” is negligible. The preimage of any y € {0,1}” 
is either of the form (-,y,w,-) where (y,w) € R or of the form (z,-,-) where 
f(x) = y, and hence breaking the one-wayness of F is no easier then breaking 
that of f or breaking the hard relation R. On the other hand, it is possible to 
break the k-wise direct product of F just by breaking the k-wise direct product 
of the hard relation R. Therefore, the results of Corollary P] apply to one-way 
relations as well, if we also assume the existence of a (fixed) regular one-way 
function f (and an exponentially secure one for the counterexample to the weaker 
conjecture). In the full version of this work, we also show how to instantiate f 
using the ESPR function h so as to get the results of Corollary 2] for one-way 
functions, without needing any additional assumptions. 


4.2 Justifying the ESPR. Assumption 


We now give some justification that ESPR hash functions may exist by show- 
ing how to construct them in in a variant of the random-oracle (RO) model. Of 
course, constructions in the random-oracle model do not seem to offer any mean- 
ingful guarantees for showing that the corresponding primitive may be realized 
by a fixed hash function: indeed the RO model immediately implies collision 
resistance which cannot be realized by a fixed hash function. Rather, the RO 
model is usually interpreted as implying that the given primitive is likely to be 
realizable by a family of hash functions. Therefore, we will work with a variant 
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of the RO model in which the attacker is initialized with some arbitrary “oracle- 
dependent auxiliary input”. This model was proposed by Unruh [BI] with the 
explicit motivation of capturing properties the can be satisfied by a fixed hash 
function. For example, the auxiliary input may include some small number of 
fixed collisions on the RO and therefore collision-resistance is unachievable in 
this model. By showing that ESPR security is achievable, we provide some jus- 
tification for this assumption. 

Let © : {0,1}°" + {0,1}” be a fixed length random oracle. Following [31], 
we define “oracle-dependent auxiliary input” of size p(n) as an arbitrary function 
z : {{0,1}" = {0,1}"} = {0,1}? which can arbitrarily “compresses” the 
entire oracle O into p(n) bits of auxiliary information z(O). When considering 
security games in the oracle-dependent auxiliary input model, we consider at- 
tackers A°(z(©)) which are initialized with polynomial-sized oracle-dependent 
auxiliary input z(-). In the full version, we show that the ESPR security game 
is hard in the random oracle model with auxiliary input. 


Theorem 4. Let O be modeled as a random oracle, and consider the ESPR 
game in which h is replaced with O. Then, for any attacker A°(z(O)) with 
polynomial-sized auxiliary input z(-) and making at most polynomially many 
queries to O, its probability of winning the ESPR game is at most e = 2-2), 
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Abstract. Two central notions of Zero Knowledge that provide strong, 
yet seemingly incomparable security guarantees against malicious 
verifiers are those of Statistical Zero Knowledge and Resettable Zero 
Knowledge. The current state of the art includes several feasibility and 
impossibility results regarding these two notions separately. However, the 
question of achieving Resettable Statistical Zero Knowledge (i.e., Reset- 
table Zero Knowledge and Statistical Zero Knowledge simultaneously) 
for non-trivial languages remained open. In this paper, we show: 


— Resettable Statistical Zero Knowledge with unbounded prover: un- 
der the assumption that sub-exponentially hard one-way functions 
exist, rSZK = SZK. In other words, every language that admits a 
Statistical Zero-Knowledge (SZK) proof system also admits a Re- 
settable Statistical Zero-Knowledge (rS ZK) proof system. (Further, 
the result can be re-stated unconditionally provided there exists a 
sub-exponentially hard language in SZK). Moreover, under the as- 
sumption that (standard) one-way functions exist, all languages L 
such that the complement of L is random self reducible, admit a 
rS ZK; in other words: co-RSR C rSZK. 

— Resettable Statistical Zero Knowledge with efficient prover: efficient- 
prover Resettable Statistical Zero-Knowledge proof systems exist for 
all languages that admit hash proof systems (e.g., QNR, QR, DDH, 
DCR). Furthermore, for these languages we construct a two-round 
resettable statistical witness-indistinguishable argument system. 


The round complexity of our proof systems is O(log &), where « is the 
security parameter, and all our simulators are black-boz. 


1 Introduction 


The notion of a Zero-Knowledge (ZK, for short) Proof System introduced by 
Goldwasser, Micali and Rackoff [I9] is central in Cryptography. Since its intro- 
duction, the concept of a ZK proof has been extremely influential and useful 
for many other notions and applications (e.g., multi-party computation [IS], 
CCA encryption [27]). Moreover, the original definition has been then extended 
under several variations, trying to capture additional security guarantees. Well 
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known examples are the notions of non-malleable ZK introduced by Dolev, 
Dwork and Naor, which concerns security against man-in-the-middle attacks, of 
ZK arguments introduced by Brassard, Chaum and Crepeau where sound- 
ness is guaranteed only with respect to probabilistic polynomial-time adversarial 
provers, and of concurrent ZK [I6] introduced by Dwork, Naor and Sahai, which 
concerns security against concurrent malicious verifiers. Another important vari- 
ant is that of Statistical Zero Knowledge [19]3[33], where it is guaranteed that 
a transcript of a proof will remain zero knowledge even against computationally 
unbounded adversaries. 

An important model of security against malicious verifiers, known as Reset- 
table Zero-Knowledge, was introduced by Canetti, Goldreich, Goldwasser and 
Micali in [5]. In this setting, the malicious verifier is allowed to reset the prover, 
and make it re-use its randomness for proving new theorems. Indeed, one of 
the main motivations for studying resettable ZK was to understand the conse- 
quences of re-using limited randomness on the zero-knowledge property. In [5], it 
was shown that computational zero-knowledge for all of NP is possible even in 
this highly adversarial setting. Although resettable zero knowledge has received 
considerable attention since its inception (see for example [J24{13]39/12]8]35)), 
almost all the work has been focused on the computational setting. 

In this work, we continue the line of research on resettable ZK by investigat- 
ing the question of resettability when the zero-knowledge property is required 
to be statistical, i.e., Resettable Statistical Zero Knowledge. This model con- 
strains the prover strategy severely: not only should the prover somehow re-use 
its limited randomness, it must do so in a way that makes the transcript of the 
proof statistically secure. Known solutions in the setting of computational reset- 
table ZK involve converting prover’s bounded randomness to unbounded pseudo- 
randomness by using pseudo-random functions (PRF). However, this approach 
fails in our case, as an unbounded adversary can break the PRF and gain critical 
information, breaking zero knowledge. In this paper, we develop a new technique 
to handle this problem. Using this technique, we study resettable statistical zero 
knowledge in the form of following two distinct questions. 


— Do there exist efficient-prover resettable statistical ZK proofs? This question 
is motivated by practical applications of resettable ZK, for example, in smart 
cards. If a prover is to be implemented in a small device like a smart card, 
it is essential that the prover strategy is polynomial-time. 

— What languages in SZK have resettable statistical ZK proofs? The class 
SZK is the class of problems which admit statistical zero-knowledge proofs. 
This question is purely theoretical in nature, and tries to ascertain the 
difficulty of achieving resettability where statistical zero-knowledge already 
exists. In this setting we consider prover’s which are forced into giving mul- 
tiple proofs using the same limited random coins. This work can be thought 
of a natural extension of the recent work on Concurrent Statistical Zero- 
Knowledge (cS ZK) [25)30]. 
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1.1 Our Contribution 


In this paper we address the above questions and present the following results. 
We stress that our techniques may be of independent interest. 


Resettable Statistical Zero Knowledge with efficient prover. We show the exis- 
tence of efficient-prover resettable statistical ZK proof systems for all languages 
in SZK that admit hash proof systems [IO] (e.g., Quadratic Non-Residuosity 
(QNR), Decisional Diffie-Hellman (DDH), Decisional Composite Residuosity 
(DCR)). Therefore, our techniques show that efficient-prover resettable statisti- 
cal ZK proof systems also exist for non-trivial languages (like DDH) where each 
instance is associated to more than one witness, where intuitively reset attacks 
are harder to deal with[] Furthermore, using our techniques, for these languages 
we also construct a two-round resettable statistical witness-indistinguishable ar- 
gument system. 


Resettable Statistical Zero Knowledge with unbounded prover. We show that 
if a family of sub-exponentially hard one-way functions exists then rSZK = 
SZK, i.e., all languages that admit a statistical ZK proof systems also admit a 
resettable statistical ZK proof system. If there exists an SZK language L which 
is (worst-case) sub-exponentially hard for all input lengtH}] then rSZK = SZK 
without any additional assumptions, as it already implies the existence of sub- 
exponentially hard one-way functions [29]. Informally, a sub-exponentially hard 
one-way function is a one-way function that is secure against sub-exponential 
(25° for some 0 < e < 1) size circuits. Moreover, we show that if a family of 
(standard) one-way functions exists (or, if there are languages which are hard 
on the average and admit statistical zero-knowledge proofs [29]) then co-RSR C 
rSZK. Our results are achieved through a novel use of instance-dependent (ID, 
for short) commitment schemes, a new simulation technique, and a coin-tossing 
protocol that is secure under reset attacks that we build on top of a new ID 
commitment for all SZK. 

Our simulators are black-box and the round complexity of all our constructions 
is O(log x) which is optimal considering the lower bounds achieved so far for 
black-box concurrent ZK [6[26). 

We stress that since the very introduction in [5] of the notion of resettable 
ZK, our results are the first in establishing Resettable Statistical Zero Knowledge. 


1 When there are multiple witnesses that can prove membership of an instance in 
a language, in a reset attack we allow the adversarial verifier to force the prover 
to reuse the same randomness for proving the same instance but using a different 
witness. We therefore achieve a stronger definition of resettability than the one used 
in previous work. 

? If there exists a language L € SZK such that for infinite sequence of input lengths, 
the worst-case decision problem for L is sub-exponentially-hard, Ostrovsky showed 
that there exists a non-uniform sub-exponentially hard one-way functions for that 
sequence of input length [29]. 
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We finally leave open an interesting question of proving that SZK =rSZK un- 
conditionally or under relaxed complexity-theoretic assumptions and of estab- 
lishing whether resettable statistical ZK arguments are achievable for all NP. 

As a final note, we remark upon the complexity of the verifiers in our proto- 
cols. Historically, the notion of SZK was developed with bounded verifiers (and 
unbounded distinguishers), for example, see [3[37]. Moving in the same direc- 
tion, we obtain our results in this model, where the verifiers are computationally 
bounded. In subsequent literature on SZK, the stronger notion of statistical 
zero-knowledge against unbounded verifiers was developed. In this scenario, the 
notion of resettability seems hard to achieve: unbounded verifiers can compute 
statistical correlations on the fly by making multiple reset queries to the prover. 
We leave the question of constructing such protocols or showing impossibility in 
a setting with unbounded verifiers as an open problem for future work. 


1.2 Technical Difficulties and New Techniques 


We begin by asking the general question: “Why is the problem of constructing re- 
settable statistical zero-knowledge proof systems hard?” The problem lies in the 
fact that the prover has limited randomness and can be reset. Therefore, prover’s 
messages are essentially a deterministic function of the verifier’s messages, and 
the verifier can probe this function by resetting the prover and thereby obtaining 
information that might be useful for an unbounded distinguisher. We highlight 
the issues by demonstrating why existing techniques fail. The most well studied 
way of achieving resettable computational zero-knowledge proofs [5], is by using 
a pseudorandom function. In particular, very informally, using this technique the 
prover applies a pseudorandom function on the common input and the verifier’s 
first messages (this message is called the determining message), which fixes all 
future messages of verifier, and uses the output as its random tape. Now, when 
the verifier resets and changes its determining message, prover’s random tape 
changes, and thus, intuitively, the verifier does not gain any advantage by re- 
setting the prover. However, for our goal of obtaining resettable statistical zero 
knowledge, this approach is not sufficient. In fact, intuitively, any protocol (as 
far as we know) in which there exists a message computed using both the wit- 
ness and the randomness, where the randomness is fixed but the witnesses could 
change with theorem statements, can not be statistically “secure” in presence of 
reset attacks. Indeed, an adversarial verifier could interact multiple times with 
provers that use a fixed randomness but different statements and witnesses. This 
information can be used by an unbounded distinguisher to establish certain cor- 
relations among the values used in different executions, ultimately breaking the 
statistical ZK property. Because of these restrictions, previously known tech- 
niques, which were sufficient for resettable [S[I] and statistical ZK P5PORI] 
independently, turn out to be insufficient for achieving both of them simultane- 
ously. 

In light of the intuition above resettable statistical ZK for non-trivial lan- 
guages at first sight might be considered impossible to achieve. But, on the con- 
trary we develop a new technique that overcomes the above problems. 
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We demonstrate this new technique by first considering a toy version of our 
protocol. The protocol consists of three phases. In the first phase the verifier 
sends a “special” instance-dependent non-interactive (ID, for short) commit- 
ment of a random string m to the prover. (In this commitment, if the prover 
is lying and x ¢ L, then m will be undefined, while if x € L, then m will be 
unique.) The second phase consists of a PRS preamble [32]. Very roughly, in the 
PRS preamble the verifier commits to random shares of m, which are opened 
depending on the provers challenges. Finally, the prover is required to send m 
to the verifier. The prover can obtain m by extracting it from the commitment 
either efficiently using a witness in case of efficient-prover proofs, or running in 
exponential time in case of unbounded-prover proofs. We stress that when the 
theorem being proved is true the message m that can be extracted is unique. 
First, the protocol just described has the following property: every message 
sent by the prover is public coin] except its last message, which is uniquely de- 
termined by the first message of the verifier (we use [5] terminology and refer 
to it as the determining message). Most importantly, no message depends on 
the witness of the prover. It is this property that allows a simulator to generate 
a transcript that is statically close to the transcript generated in the interac- 
tion with a real prover. An honest prover uses a pseudorandom function on the 
common input and the determining message and uses the output as its random 
tape. A simulator can sample the messages from the same distribution as the 
real prover. Finally, the simulator will be able to obtain m by using rewinding 
capabilities, through a variation of a PRS rewinding strategy [32]. The need for 
the variation arises from the fact that a simulator that uses pseudo-random coins 
does not gain anything by rewinding (i.e., after a rewind it would re-send the 
same message). We deal with this problem by having the simulator use pseu- 
dorandom coins for some messages while using pure random coins for others. 
We elaborate on this in § Æ This toy version, described above, illustrates the 
key ideas that we use in achieving simultaneously both resettable and statisti- 
cal zero knowledge. To transform our toy version into a full proof system, for 
even the most basic languages that we consider in this paper, we need an extra 
instance-dependent primitive. But we defer this discussion to § Bland § 5} 
Second, our protocol also has the property that if the theorem is false then 
the prover has almost no chance (in the information-theoretic sense) of sending 
an accepting last message. This follows from the fact that the ID commitment 
from verifier is statistically hiding. This property guarantees soundness. 
Unfortunately, the above ideas are insufficient to prove that rSZK = SZK. 
This is because statistically hiding non-interactive ID commitments, introduced 
by Chailloux, Ciocan, Kerenidis and Vadhan [7] for SZK are “honest-sender.” To 
force the sender into using purely random coins we need a coin-flipping protocol 
secure against resetting senders. For this coin-flipping protocol an ID commit- 
ment scheme which is computationally binding with respect to a resetting sender 
for instances in the language and statistically hiding for instances not in the lan- 
guage, suffices. We will use some techniques introduced by Barak, Goldreich, 


3 Looking ahead, we will use a pseudorandom function to generate these messages. 
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Goldwasser and Lindell in [J] on top of a previous result of Ong and Vadhan [28] 
for obtaining such an ID commitment scheme. 

However the more subtle problem arises in the use of pseudorandom func- 
tions. To obtain security against reset attacks, the coin-flipping message played 
by the receiver of the commitment must be computed by using a pseudoran- 
dom function. This again turns out to be insufficient for our analysis since the 
use of the pseudorandom function does not guarantee that the outcome of the 
coin-flipping protocol is a uniform string to be used in the honest-sender non- 
interactive ID commitment scheme. In order to solve this additional problem, 
we use sub-exponentially hard pseudorandom functions (constructed from sub- 
exponentially hard one-way functions). These stronger primitives have the addi- 
tional property that they are secure against sub-exponential size circuits. This 
technique is referred to as complexity leveraging, and has been previously used in 
various applications (e.g., [5[23/2]7 19131138] ). However, we stress that in all our 
constructions, the simulator runs in expected polynomial time, and the above 
assumptions play a role only inside our security proof. 

Before concluding this section, we point out an important difference between 
our approach and ideas developed by Micciancio, Ong, Sahai and Vadhan in [25], 
where the authors give unconditional constructions of concurrent statistical zero- 
knowledge proofs for many non-trivial problems. Like their construction we use 
similar ID commitments but our general approach and overall protocol is dif- 
ferent from their approach. In [25], a compiler is constructed that (using ID 
commitments) provides a generic way to construct statistical zero-knowledge 
protocols. But, as pointed earlier, such a compiling technique along with stan- 
dard resettability techniques |5| is not sufficient for us. Therefore, we develop our 
zero-knowledge protocol from scratch. This is needed because obtaining resetta- 
bility along with statistical zero knowledge is different and (as pointed earlier) 
harder than obtaining concurrent statistical zero knowledge. We further note 
that in fact our techniques imply that SZK = cSZK unconditionally. We refer 
the reader to the full version for further discussion on this. 


Road map. We start by giving some preliminary definitions in §[2] We use three 
ID primitives in this paper. We elaborate on those in § [3] In § [4] we construct a 
resettable statistical ZK proof secure against partially honest verifiers. Then in 
§B]we remove this limitation for certain classes of languages. In §[6] we construct 
the proof system that works for all language in SZK. 


2 Notation and Tools 


We say that a function is negligible in the security parameter « if it is asymptoti- 
cally smaller than the inverse of any fixed polynomial. Otherwise, the function is 
said to be non-negligible in k. We say that an event happens with overwhelming 
probability if it happens with a probability p(x) = 1 — v(x) where v(x) is a 
negligible function of «. In this section, we provide an overview of the primitives 
used in this paper. Formal definitions can be found in the full version [I7]. 
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Resettable/Statistical Zero Knowledge. In this paper we consider resettable [5] 
and statistical [[9J3]33] notions of zero-knowledge. The notion of resettability 
requires that a protocol remains zero-knowledge even if the verifier can reset 
the prover. The notion of statistical zero knowledge provides security guarantees 
against unbounded distinguishers. This paper constructs resettable statistical 
zero-knowledge proof systems. In other words we try to achieve both the reset- 
tability and the statistical guarantees simultaneously. 


PRS Preamble from [32). A PRS preamble is a protocol between a committer C 
and a receiver R that consists of two main phases, namely: 1) the commitment 
phase, and 2) the challenge-response phase. 

Let & be a parameter that determines the round-complexity of the protocol. 
Then, in the commitment phase, very informally, the committer commits to 
a secret string o and k? pairs of its 2-out-of-2 secret shares. The challenge- 
response phase consists of k iterations, where in each iteration, very informally, 
the committer “opens” k shares, one each from k different pairs of secret shares 
as chosen by the receiver. 

The goal of this protocol is to enable the simulator to be able to rewind and 
extract the “preamble secret” o with overwhelming probability. In the concurrent 
setting, rewinding can be difficult since one may rewind to a time step that pre- 
cedes the start of some other protocol [16]. However, as it has been demonstrated 
in [32], there is a fixed “time-oblivious” rewinding strategy that the simulator can 
use to extract the preamble secrets from every concurrent cheating committer, 
except with negligible probability. Moreover this works as long as k = Q(log K) 
for some positive e. We refer to this as the PRS rewinding strategy or the PRS 
simulation strategy. We refer the reader to [32] for more details. 


Sub-exponentially hard one-way functions. A sub-exponentially hard one-way 
function is a one-way function that is hard to invert even by sub-exponential (2 
for some 1 > e > 0) size circuits. They imply the existence of sub-exponentially 
hard pseudorandom functions. We stress that we need this assumption only for 
proving that SZK =r1SZK. 


3 Instance-Dependent Commitments and Proofs 


In this section we construct three instance-dependent primitives, that we use in 
this paper: (1) a non-interactive instance-dependent commitment scheme, (2) an 
interactive instance-dependent commitment scheme, and finally (3) an instance- 
dependent argument system. 


Non-Interactive Instance-Dependent Commitment Scheme. An important tool 
that we will re-define, construct and use in our proof systems, is that of “spe- 
cial” non-interactive instance-dependent (ID, for short) commitment schemes. A 
commitment scheme allows one party (referred to as the sender) to commit to 
a value while keeping it hidden, with the ability to reveal the committed value 
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later. Commitments also have the property that once the sender commits to a 
value, it can not change its mind later. This property is refereed to as the binding 
property. In certain settings, commitment schemes for which these properties are 
not required to hold simultaneously, suffice. Such schemes are parameterized by 
a value x and a language L and either the binding or the hiding property holds 
depending upon the membership of x in L. These schemes are referred to as 
ID commitment schemes [7]. Typically, the ID commitment schemes that have 
been considered in the literature require hiding property to hold when z € L 
and binding property to hold otherwise. We actually need the reverse properties, 
i.e., we need hiding property when x ¢ L and binding property otherwise. 

In particular we consider an ID commitment scheme with further special 
properties. We require that the commitment scheme be statistically binding for 
x € L and statistically hiding otherwise. In other words we want binding and 
hiding properties to hold against unbounded adversaries. Also we require that 
our ID commitment scheme be secure against a resetting sender. This always 
holds when the commitment scheme is non-interactive. All the non-interactive 
ID commitments that we consider are statistically hiding. So to simplify nota- 
tion we refer to a non-interactive instance-dependent commitment scheme with 
perfect (binding holds with probability 1) binding and statistical hiding as a 
perfect non-interactive ID commitment. Similarly, we refer to a non-interactive 
instance-dependent commitment scheme with statistical binding and statistical 
hiding as a statistical non-interactive ID commitment. 

Since the commitment is statistically binding, when x € L, the committed 
value can always (with overwhelming probability) be extracted in exponential 
time. Extractability instead becomes tricky when the extractor has to run in 
polynomial time. We will call an ID commitment scheme efficiently extractable 
if when x € L then there exists an extractor that takes as input a witness for 
the membership of x in L and the commitment, and outputs in polynomial-time 
the committed message. 

It turns out that perfect non-interactive ID commitment schemes are actually 
known to exist for all languages in co-RSR PABBA. co-RSR is the class 
of languages such that the complement of each of these languages is random 
self-reducible. Another class of languages that is amenable to our techniques is 
the class of languages that are in SZK and that admit a hash proof system. 
Observing that these languages imply instance-dependent primitives that are 
analogous to ID commitments described above, we get efficient-prover resettable 
statistical ZK proof systems for this interesting class. In particular, for DDH 
(the language that consists of all Diffie-Hellman quadruples and that admits two 
different witnesses for proving the membership of a quadruple to the language), 
we give a separate ID commitment scheme highlighting how our techniques work 
with multiple witnesses. 

We notice that for the whole SZK we only know a weak form of statistical non- 
interactive ID commitment scheme where statistical binding holds with respect 
to honest senders only. The details have been provided in the full version. 
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We will denote the extractable perfect (or, statistical) non-interactive ID com- 
mitment scheme by COM. The commitment function for a value z and language 
L will be denoted by C;z,,. Also we use the notation Cz.(m;r) for the function 
used to generate the commitment to message m € {0,1} using random coins 
r € {0,1}. The extractability property of these commitments is very important 
for our constructions. 


Interactive ID Commitment Scheme. We use an interactive ID commitment 
scheme COML x = (Sz, Rx), where Sẹ and R, are the sender and the receiver 
respectively, with common input xz. This ID commitment scheme is computation- 
ally binding against a resetting sender when the instance xv is in the language, 
and is statistically hiding otherwise. Very roughly, we construct such a scheme 
by using the constant-round public-coin ID commitment scheme of [28]. This 
scheme has statistical binding and statistical hiding properties. We make it se- 
cure under resetting senders by having the receiver determine its messages by 
applying a pseudo-random function (similarly to Proposition 3.1 in [I]) to the 
transcript so far. Because of this, the statistical binding property is degraded to 
computational binding. We stress that unlike the non-interactive ID commit- 
ment described earlier, we will not need any extractability from these commit- 
ments. We obtain this new ID commitment scheme for all of SZK under the 
assumption that one-way functions exist. The details have been provided in the 
full version. 


Instance-Dependent Argument System (PrsSWI,, VrsSWI,). We will need an 
instance-dependent argument system (PrsSWI,, VrsSWI,,) where PrsSWI,, and 
VrsSWI,. are the prover and the verifier respectively, with common inputs x and 
a statement £. Wher) x is in the language, we want that (PrsSWI.., VrsSWI,,) be 
a resettably sound argument of knowledge for MP. In this case, very roughly, 
(PrsSWI,,, VrsSWI.,) has the additional property that the soundness holds even 
when the prover can reset the verifier. If instead x is not in the language then 
(PrsSWI,,, VrsSWI,,) must be statistical witness indistinguishable. We construct 
this argument system by instantiating Blum’s Hamiltonicity protocol with the 
constant-round public-coin ID commitment scheme of [28]. We make it resettably 
sound by using a pseudorandom function [I]. Details, definition and construc- 
tions are given in the full version. 


4 However, looking ahead we note that, computational binding will be sufficient for our 
applications since the role of the sender will be played by a polynomially bounded 
party. 

5 In general, in proof systems when an ID commitment is used, it is parameterized by 
the theorem statement € being proven. In our case the ID commitment is actually 
parameterized by a different value x. Looking ahead, x would be the theorem state- 
ment of an interactive proof system that uses the sub-protocol (PrsSWlz, VrsSWI) 
to prove the VP statement £. 
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4 Resettable Partially Honest-Verifier Statistical Zero 
Knowledge 


We aim at constructing a resettable statistical zero-knowledge proof system. We 
start by building a simpler protocol which is resettable statistical zero knowledge 
only against a restricted class of adversarial verifiers. In subsequent sections, we 
build upon this simpler protocol to achieve our general results. The adversarial 
verifiers that we consider here are restricted to “act honestly” but only in a 
limited manner. We call such verifiers partially honest. As pointed out in § [3] we 
use a non-interactive ID commitment scheme. Looking ahead, in our protocol this 
commitment is used by the verifier to commit to certain messages. A partially 
honest verifier is required to behave honestly when computing the commitment 
function, using pure randomness to commit to messages. Besides this it can 
cheat in any other way. We state this restriction more concretely after we have 
described the protocol. 

We begin by construction a concurrent statistical zero-knowledge proof system 
secure against such partially honest adversaries, and then transform it into a 
resettable statistical zero-knowledge proof system under the same restricted class 
of adversarial verifiers. 


Concurrent Partially Honest-Verifier SZK. We start by informally describing 
the protocol cpHSZK of Fig. [J It consists of three phases. The first phase, called 
the Determining Message Phase, consists of the verifier sending a commitment to 
a string m to the prover. We use the extractable non-interactive ID commitment 
scheme described earlier. The second phase is roughly a PRS preamble [32] 
and we refer to it as the PRS Phase. Note that some commitments are made 
in the PRS preamble, but we lump these with the commitment to m, in the 
Determining Message Phase itself. Finally the prover sends to the verifier the 
value m. This is referred to as the Final Message. An adversarial verifier, denoted 
by V*, is called a partially honest verifier if it generates the non-interactive ID 
commitments of the Determining Message Phase “honestly.” This requires that 
these ID commitments are: (1) “well-formed” and (2) have uniqud openings 
(except with negligible probability). 

We begin by briefly sketching why cpHSZK is a concurrent statistical zero- 
knowledge proof system for L. Full details of the proof are in the full version. 
Completeness follows from binding property of COM: when x € L, the commit- 
ments in the Determining Message Phase are statistically binding with unique 
openings with overwhelming probability. Thus, the prover can extract the unique 
message m and make the verifier accept in the Final Message Phase. For sound- 
ness, note that when x ¢ L, the commitments in the first phase are statistically 
hiding. Thus, m committed to in the Determining Message Phase is informa- 
tion theoretically hidden from a cheating prover (also shares received during the 


6 Tt follows from the description in the full version, that a perfectly non-interactive 
ID commitment always has a unique opening. On the other hand an honest sender 
statistical non-interactive ID commitment, has a unique opening with overwhelming 
probability, for honest senders only. 
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Common Input: x € LN {0,1}", k =w(logs«) and n = poly(«), for a security 
parameter K.. 

Secret Input to P: Witness w such that (x,w) € Rz (not needed in case of 
unbounded prover). 


1. Determining Message (V — P) V chooses message m randomly from 
{0,1}, and computes a = Cz,.(m; po) for some random po € at 1}41. For 


1 = i < kand 1 < j < k, V randomly chooses of; and oj; such that 


i,j i,j 


ol; ® oj; = m. For each (ijb), where] <i<k,1<j<kandbe 
{0,1}, V randomly chooses pe; € {0, 1} computes the commitment ae : 
Ci alates pes): Finally, V sends all the commitments a, af 1, at ie 


the prover. 
. PRS Phase (V © P) For 1 <1< k: 
(a) P sends b; chosen randomly in {0,1}* to V. 
oF 


(b) Let bj be the it” bit of by. V sends the openings of at pee Oy he 
(c) If the opening sent by the verifier is invalid, then P sends ABORT to 
verifier, and aborts the protocol. 

. Final Message (P —> V) P runs the extractor associated to the ID com- 
mitment of the Determining Message Phase. If the extractor aborts then P 
aborts, else P sends the output of the extractor m’ to V, who accepts if 
m =m. 


Fig. 1. Concurrent Partially Honest-Verifier Statistical Zero-Knowledge Proof System: 
cpHSZK 


preamble do not give any information), and therefore, it can convince the verifier 
only with negligible probability. 

To argue zero knowledge, we use the rewinding strategy of [32]. Using the 
PRS rewinding strategy we can construct a simulator that obliviously rewinds 
the verifier and is guaranteed (except with negligible probability) to obtain the 
opening m committed to in the Determining Message Phase, before the end of 
the PRS Phase for every session (except with negligible probability) initiated by 
the cheating verifier. Once the cpHSZK simulator knows the message m commit- 
ted in the Determining Message Phase, it can play it back to the verifier in the 
Final Phase. 

Note that to prove zero knowledge, we crucially use the fact that the verifier 
is partially honest. First, we need that the commitment sent by the verifier are 
correctly formed. This is to make sure that the commitments are done in accor- 
dance with the specifications for the first message of PRS preamble. Secondly, 
we need that these commitments have unique openings with overwhelming prob- 
ability. If, for example, verifier’s commitment to m in the Determining Message 
Phase has two openings, then the simulation would fail. Indeed, an unbounded 
prover unable to decide which is the right opening, would always abort while the 
simulator would still extract some message from the PRS Phase and send that 
to the verifier in the Final Phase. The case of an efficient prover instead would 
result in extracting a message that could depend on the witness used, while 
the one obtained by the simulator would not depend on the witness, therefore 
potentially generating a distinguishable deviation in the transcript. 


Resettable Statistical Zero Knowledge 505 


Resettable Partially Honest- Verifier SZK. We now exploit a key property of 
cpHSZK and transform it into a resettable statistical zero-knowledge proof sys- 
tem secure against partially honest verifiers. We note that the final message of 
cpHSZK depends only on the first message of the verifier. In particular, it de- 
pends neither on the random tape of the prover, nor on its witness. Also messages 
of the prover in the PRS phase are just random strings. Thus, very informally, 
an adversarial verifier can not obtain any advantage by resetting the prover, as 
after every reset, the verifier will get the same message back in the final round. 
This is a crucial fact that allows us to achieve resettability. 


Common Input: x € LAN {0,1}",k = w(logk),n = poly(«) for a security 
parameter «K. 

Secret Input to P: Witness w such that (x,w) € Rz (not needed in case of 
unbounded prover). 


1. Determining Message Same as in Fig. 
2. PRS Phase (V <= P) P chooses a random seed s, and sets w = 


fa(z, 0,04 1, ae Ob bk): Now P divides w into k blocks of k-bits each, i.e., 
w=wio...ow*. For1<I<k, 
(a) P sends w! to V. 
(b) Same as Fig. [I] Step 2b] 
(c) Same as in Fig. [I] Step Be] 
3. Final Message Same as in Fig. 


Fig. 2. Resettable Statistical Partially Honest Verifier Zero-Knowledge Proof System 
rpHSZK 


The transformed protocol, called rpHSZK (Fig. 2), is the same as cpHSZK, 
except for one difference: in the PRS Phase, instead of sending random challenges 
in Step 2(a), the prover uses pseudorandom challenges. The prover chooses a 
random seed s for selecting a function from a PRF family { fs }se{o,1}+, and sets 
w as the output of f,() evaluated on the message received during the Determining 
Message Phase. The prover uses this w as its random tape for the PRS phase. A 
modification of the PRS simulation where the simulator uses both pseudorandom 
and random messages during the preamble, along with other known tricks [5] 
allows us to prove that this protocol is a resettable statistical zero-knowledge 
proof system with respect to partially honest verifiers. 


5 Resettable Statistical ZK from Perfect Non-interactive 
ID Commitments 


In this section we consider languages that admit perfect non-interactive ID com- 
mitments and we construct a resettable statistical ZK proof system which is 
secure against all malicious verifiers. 
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Let L be a language that admits a perfect non-interactive ID commitment 
scheme. We extend the proof system rpHSZK for L to handle arbitrary malicious 
verifiers. The main idea is to enforce “partially honest behavior” on the malicious 
verifier. We recall that the partially honest restriction on a verifier required that 
the verifier generates commitments honestly. More specifically, we required that 
these commitments have unique openings and are correctly constructed. A fully 
malicious verifier however can deviate and compute commitments that do not 
have the prescribed form. Therefore, the only concern we have is to make sure 
that commitments are correctly generated. We enforce this by modifying rpHSZK 
and adding an extra step to it. This step requires that the verifier proves to the 
prover that shares constructed in Step 1 (as part of the Determining Message) 
are correct. If this proof is accepted then the prover can conclude that the first 
message of the verifier is indeed honestly generated and the malicious verifier 
is forced into following the desired partially honest behavior. In our protocol 
the verifier uses an instance-dependent argument system (PrsSWI,, VrsSWI,) 
such that: when x € L, (PrsSWI,,VrsSWI,) is a resettably sound argument 
of knowledge, while when x ¢ L, (PrsSWI,., VrsSWI.,) is statistically witness 
indistinguishable. Since the protocol is resettably sound the malicious verifier 
can not go ahead with incorrect commitments even when it can reset the prover. 
For the protocol see Fig. [B] 


Sub-protocol: (PrsSWI,, VrsSWI,) is a resettably sound argument of knowledge 
when «x € L and a statistical witness indistinguishable argument when x ¢ L. 
Common Input: z € LM {0,1}",k = w(logK),n = poly(«) for a security 
parameter kK. 

Secret Input to H} Witness w such that (a,w) € Rz (not needed in case of 
unbounded prover). 


1. Determining Message: Same as in Fig. P] 

2. Proof of Consistency: (V = P) V and P run (PrsSWI,, VrsSWI,), where 
V plays the role of PrsSWI,, and P plays the role of VrsSWI,. V proves to P 
knowledge of m, o? j, po, p? j for 1 < i,j, < k, b € {0,1}, such that: 
(a) a= Cr,x(m, po), and, 
(b) af ; = CLl? j; p? j) for each 1 < i,j <k and b€ {0,1}, and, 
(c) of; Bor; =m for 1<i,j <k. 

3. PRS Phase: Same as in Fig. P] 

4. Final Message: Same as in Fig. P] 


“ P aborts the protocol in case any proof from the verifiers does not accept or 
some message is not well formed. 
Notice that P uses two different seeds for the PRF f (one in Step 2 and the 
other one in Step 3). 


Fig. 3. Resettable Statistical Zero-Knowledge from Perfect Non-Interactive ID Com- 
mitments: rSZK 
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Application to co-RSR and hash proof systems. Languages in co-RSR and 
DDH have perfect non-interactive ID commitment schemes. Thus, from the 
discussion above, it follows that these languages have resettable statistical zero- 
knowledge proofs. For languages in SZK that admit hash proof systems, a minor 
modification of our resettable statistical zero-knowledge protocol suffices. The 
details are provided in the full version [I7]. 


6 Resettable Statistical ZK for All Languages in SZK 


In this section we construct the general proof system which is actually resettable 
statistical zero knowledge for all languages that have a statistical zero knowledge 
proof. Just like in previous section, we start with a resettable partially honest 
verifier statistical ZK proof system. But we look at all languages in SZK and 
construct a resettable statistical ZK proof system which is secure against all 
malicious verifiers. 

Let L be a language that admits an honest sender statistical non-interactive 
ID commitment scheme COM. We extend the proof system rpHSZK for L to 
handle arbitrary malicious verifiers. The main idea is to enforce “partially honest 
behavior” on the malicious verifier. We recall that the partially honest restric- 
tion on a verifier required that the verifier uses COM to generate commitments 
honestly. More specifically, we required that these commitments are correctly 
constructed and have unique openings. The first requirement can be handled 
in a way just like in previous section, i.e. by having the verifier prove to the 
prover that shares constructed in Step 1 (as part of the Determining Message) 
are correct. We use the ID argument system (PrsSWI,,, VrsSWI.,.) to achieve this. 
The problem of uniqueness is more tricky, and we discuss that next. 

The difficulty lies in the fact that the statistical non-interactive ID commit- 
ment scheme for all languages in SZK [7], only works with respect to honest 
senders. Indeed, if the sender chooses the randomness for the commitment uni- 
formly, then, with overwhelming probability, the computed commitment has a 
unique valid opening. However a malicious sender could focus on a set of neg- 
ligible size, B, of bad random strings r, such that Cz..(m;r) does not have a 
unique opening. If a malicious verifier (that plays as sender of this commitment 
scheme) is able to pick random strings from B, then the real interaction and the 
simulation can be easily distinguished. In the real protocol, the prover tries to 
invert the commitment a, finds it does not have a unique opening, and aborts. 
In the simulation, the simulator extracts some message m from the PRS phase, 
and sends m as the final message. As the simulator is polynomially bounded, 
it can not detect if the commitment has a unique opening or not. To use this 
commitment scheme, we must somehow ensure that the verifier does not use bad 
randomness for its commitments. We do this by adding a special coin-flipping 
subprotocol at the beginning of the protocol. However, because of reset attacks, 
the coin-flipping subprotocol introduces several technical problems. 

We begin by describing our coin-flipping protocol. The coin-flipping protocol 
requires a commitment scheme such that computational binding holds against 
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resetting senders when x € L and statistical hiding holds when x ¢ L. We use 
the interactive ID commitment scheme COM 2 = (Sz, Re). The coin flipping 
proceeds as follows: first the verifier commits to a random string rı. Let the 
transcript of the interactive commitment be c. Then prover applies the sub- 
exponentially hard PRF f,(c) and obtains r2 that is sent to the verifier. The 
randomness that the verifier will use for the non-interactive ID commitment 
is rı ® rg. For technical reasons, the verifier also needs to prove knowledge of 
rı after it has committed to rı. We use the interactive ID argument system 
(PrsSWI,,, VrsSWI,,) for this. 

Next we highlight the reasons behind the use of sub-exponentially hard pseu- 
dorandom (PRF) functions for our construction. Let a be the statistical non- 
interactive ID commitment of some message m sent by the verifier. There are 
two ways in which a might not have a unique opening. In the first case, a mali- 
cious V*, after looking at prover’s response rə, might use an opening of c such 
that rı ® rg € B. This however would violate the computational binding of the 
interactive ID commitment scheme secure against resetting senders used in the 
coin flipping, thus this event occurs with negligible probability. The second case 
is more subtle. It might be possible that performing reset attacks, the verifier 
can study the behavior of the PRF, and then can be able to succeed in obtaining 
that rı ® r2 € B with non-negligible probability (even though the polynomial- 
time V* does not know the two openings). In this case, we can not construct 
a polynomial-time adversary that breaks fs, as we can not efficiently decide if 
r € B. This is where we need the sub-exponential hardness of the one-way func- 
tion and in turn of the PRF. As |B] is only 2° while the size of the set of all 
random strings is 2”, where l = o(L), we can give the entire set B as input to the 
sub-exponential size circuit that aims at breaking the PRF. The circuit can now 
check if the string r is a bad string or not, by searching through its input. Notice 
that one can give as input to the circuit the whole B for each of the polynomial 
number of statements (since for each x there can be a different B) on which the 
reset attack is applied. This sub-exponential size circuit has still size o(L) and 
breaks the PRF which contradicts the sub-exponential hardness of the PRF. 

Completeness follows from the fact that when x € L, with overwhelming 
probability, the commitment a in the determining message will have a unique 
opening. Thus, the prover will be able to extract the committed message and 
send it as the final message to the verifier, that will accept. 

Statistical resettable zero knowledge property of our protocol also follows the 
same argument. Indeed, when x € L even a resetting verifier can not cheat during 
the proofs in Steps 1(c) and 3. Moreover, the above discussion about the security 
of the coin-flipping protocol implies that a resetting adversarial verifier is forced 
into following partially honest behavior when computing the non-interactive ID 
commitments. 

Finally, we look at soundness. Note that when x ¢ L non-interactive ID 
commitments are statistically hiding and the protocol (PrsSWI,,VrsSWI,,) is 
statistical WI in Steps 1(c) and 3. Also note that only a single share is revealed in 
the PRS phase. From this it follows that the prover’s view when verifier commits 
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message m is statistically close to its view when verifier commits to m’, where 
m # m'. Thus, the probability that it replies with the correct final message is 
negligible. The complete protocol and proof appear in the full version f 


7 2-Round Statistical Witness Indistinguishability 


In this section, we highlight the applicability of our techniques, and construct 
a simple two-round resettable statistical witness-indistinguishable argument for 
languages that have efficiently extractable perfectly binding instance-dependent 
commitment schemes. As discussed before, this class contains, in particular, all 
languages that admit hash proof systems. We note that all results in this section 
hold in the stronger model of statistical zero-knowledge where the verifier is 
computationally unbounded. 

Informally, the two-round WI argument consists of the verifier committing 
to a randomly chosen message m using the instance-dependent commitment 
scheme for that language. The prover, using the witness and the efficient extrac- 
tor, extracts a message m’ from the commitment and sends it to the verifier. 
The verifier accepts if m = m’. Intuitively, as long as verifier’s commitment is 
well-formed, this protocol is a perfect WI, as irrespective of the witness and ran- 
domness, the prover always extracts the same message (in fact, prover’s strategy 
is deterministic). Thus, the only complication is to ensure that verifier’s com- 
mitment is well-formed in a round efficient manner. We enforce this by making 
the verifier provide a non-interactive WI proof (i.e., a one-round ZAP [2iJ15)) 
of “well-formedness” in the first round. 

For lack of space, details of the protocol and proof of security can be found 
in the full version of this paper. 
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Abstract. We investigate the concrete security of black-box zero- 
knowledge protocols when composed in parallel. As our main result, we 
give essentially tight upper and lower bounds (up to logarithmic factors 
in the security parameter) on the following measure of security (closely 
related to knowledge tightness): the number of queries made by black-box 
simulators when zero-knowledge protocols are composed in parallel. As a 
function of the number of parallel sessions, k, and the round complexity 
of the protocol, m, the bound is roughly kim, 

We also construct a modular procedure to amplify simulator-query 
lower bounds (as above), to generic lower bounds in the black-box con- 
current zero-knowledge setting. As a demonstration of our techniques, 
we give a self-contained proof of the o(log n/loglogn) lower bound for 
the round complexity of black-box concurrent zero-knowledge protocols, 
first shown by Canetti, Kilian, Petrank and Rosen (STOC 2002). Addi- 
tionally, we give a new lower bound regarding constant-round black-box 
concurrent zero-knowledge protocols: the running time of the black-box 
simulator must be at least n? 08 ™., 


Keywords: Zero-Knowledge, Knowledge Tightness, Concrete Security, 
Concurrent Zero-Knowledge Lower Bounds. 


1 Introduction 


Zero-knowledge interactive proofs, introduced by Goldwasser, Micali and Rackoff 
are paradoxical constructions allowing one player (called the prover) 
to convince another player (called the verifier) of the validity of a mathematical 
statement x € L, while providing no additional knowledge to the verifier. In 
addition to being an independent construction of interest, zero-knowledge have 
become an extremely useful tool in construction of numerous cryptographic pro- 
tocols. 
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A fundamental question regarding zero-knowledge protocols is whether their 
composition remains zero-knowledge. In theoretical constructions as well as in 
practice, a zero-knowledge protocol is sometimes composed in parallel (to amplify 
soundness or to improve efficiency, for example). It is well-known that the defi- 
nition of zero-knowledge (ZK) is not closed under parallel composition [GK96b]. 
Nevertheless, we know numerous constructions of constant-round zero-knowledge 
protocols that are secure when composed in parallel [Gol02]. As a 
result, the subject of ZK with respect to parallel composition is widely considered 
closed. 

We turn our attention to another fundamental question regarding zero- 
knowledge: its knowledge tightness. In its original definition, the zero-knowledge 
property is formalized by requiring that the view of any probabilistic polynomial 
time (PPT) verifier V in an interaction with a prover can be “indistinguishably 
reconstructed” by a PPT simulator S that interacts with no one. Since whatever 
V “sees” in the interaction can be reconstructed by the simulator, the interaction 
does not yield any knowledge to V that V can already compute by itself. Because 
the simulator is allowed to be an arbitrary PPT machine, this traditional notion 
of ZK only guarantees that the class of PPT verifiers learn nothing. 

To more concretely measure the knowledge gained by a particular verifier, 
Goldreich, Micali and Wigderson [GMW91] (see also [Gol01]) put forward the 
notion of knowledge tightness: informally, the “tightness” of a simulation is the 
ratio of the (expected) running-time of the simulator, divided by the (worst-case) 
running-time of the verifier. Thus, in a knowledge-tight ZK proof, the verifier is 
expected to gain no more knowledge than what it could have computed in time 
closely related to its worst-case running-time. In addition to theoretical inter- 
ests, the knowledge tightness of a zero-knowledge protocol is a helpful aid for 
setting the security parameter in practice. It is easy to check that the original 
zero-knowledge protocols all enjoy constant knowl- 
edge tightness. The aforementioned protocols secure under parallel composition 
also enjoy constant knowledge tightness when executed 
in isolation; however, when composed in parallel, the tightness of these proto- 
cols seem increase/loosen linearly (sometimes even quadratically) with respect 
to the number of parallel sessions (based on the currently known analysis of their 
simulators)! 

Since we do want to execute zero-knowledge protocols in parallel (for instance 
in the application of secure multi-party computation), a natural question is to 
ask: how does the knowledge tightness of a protocol vary when we increase the 
number of parallel repetitions? 


1.1 Our Results 


In this work we give essentially tight upper and lower bounds to the above 
question. Our results focus on black-box zero-knowledge and “simulator queries” , 
which we explain below. 

Informally, a protocol is black-box zero-knowledge if there exists a universal 
simulator S, called the black-box simulator, such that S generates the view of 
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any adversarial verifier V* if S is given black-box access to V*. Essentially all 
known constructions of zero-knowledge (with the notable exception of [Bar01]) 
and all practical zero-knowledge protocols are black-box zero-knowledge. Given 
a black-box simulator S, we focus on bounding the number of black-box queries 
made by S to a given adversarial verifier V*; we refer to this as the simulator- 
query complexity. It is easy to see that the number of queries made by a black- 
box simulator is closely related to knowledge tightness; in fact, for the case of 
constant round protocols, they are asymptotically equivalent. 
We state our main theorems below: 


Theorem 1. Letn be the security parameter. For any m = m(n), there exists a 
2m + 7-round black-box zero-knowledge argument IT for all of NP based on one- 
way functions, with perfect completeness and negligible soundness error, such 
that for any polynomially bounded k = k(n), the parallel composition of k-copies 
of the protocol, II®, remains black-box zero-knowledge with simulator-query com- 
plexity O(mk'/™ log? n). 


The above theorem can be extended to proofs assuming the existence of collision- 
resistant hash-functions. We complement Theorem [I] with a lower bound: 


Theorem 2. Letn be the security parameter, L be a language, and m = m(n) € 


O (ee). Suppose II is am(n)-round black-box zero-knowledge argument for 
L with perfect completeness and negligible soundness error, and suppose there 
exist a polynomially bounded k(n) > n such that the parallel composition of 
k-copies of the protocol, IY, remains black-box zero-knowledge with simulator- 


query complexity O(k'/™/(log” n)). Then, L € BPP. 


For protocols with sub-logarithmic number of rounds, Theorem[]and[Jare tight 
up to logarithmic factors in the security parameter; essentially, the simulator- 
query complexity is asymptotically close to k!/™ (in most cases, think of k 
as a low polynomial in n). We mention that one can achieve simulator-query 
complexity O(m) (independent of k) when m = w(log n). 

Briefly, our results show that the concrete security of constant-round black- 
box zero-knowledge protocols actually decays polynomially in the number of 
parallel sessions. Fortunately, this decay can be significantly slowed if we consider 
protocols with more rounds (even if we simply use a large constant m). 


1.2 Related Works 


While we are unaware of any past work that explicitly studies the knowledge 
tightness of parallelized zero-knowledge protocols, there are numerous related 
publications that focus on the composition of zero-knowledge protocols, or on the 
concrete security of zero-knowledge simulator. Dwork, Naor and Sahai 
introduces the notion of concurrent zero-knowledge protocols; these protocols 
must stay zero-knowledge even when composed arbitrarily (a strengthening over 
parallel composition). Micali and Pass [MP06] introduces the notion of precision; 
in a precise zero-knowledge protocol, the running time of the simulator should 
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be closely related to the running time of the adversarial verifier, on a view by 
view basi] (a strengthening over knowledge tightness). 

Even with these stronger requirements, Pandey et. al. is able to con- 
struct protocols that are simultaneously precise and (black-box) concurrent zero- 
knowledge. Note that our results are incomparable with the result of 
for many reasons, one of which being that black-box concurrent zero-knowledge 
protocols require logarithmically many rounds [CKPROIJJ, while our setting is 
mainly interesting for sub-logarithmic-round protocols. Interestingly, 
actually gives a construction of a family of precise concurrent zero-knowledge 
protocols, with trade-offs between round-complexity and precision, much like 
our observed trade-off between round-complexity and knowledge tightness for 
the case of parallelized zero-knowledge. 


1.3 Connection to Concurrent Zero-Knowledge 


We also present a connection from simulator-query lower bounds for zero- 
knowledge, to round-complexity lower bounds for concurrent zero-knowledge 
(cZK). Due to lack of space we postpone the result on concurrent zero-knowledge 
to the full version. We briefly discuss the ideas as follows. 

We start by describing the common framework for all known black-box zero- 
knowledge lower bounds (e.g., [KPR98}[Ros00}(CKPRO1 [BL02} [Kat08} [HRS09) ). 
Let I be a protocol for a language L. To show that IT cannot be zero-knowledge 
unless the language L is trivial (i.e, L € BPP), we start by constructing a 
decision procedure for L. Let S be the black-box zero-knowledge simulator of 
IT, and let V* be some “hard to simulate” adversarial verifier, and consider the 
following decision procedure D: on input x, D(x) accepts if and only if SV” (x) 
generates an accepting view of V*(a). Usually, the completeness of D follows 
easily from the zero-knowledge property; to show that D is sound often requires 
more work. Our query-complexity lower bounds (Theorem) also follow the same 
framework. That is, we construct some adversarial verifier V,,,, that schedules 
multiple sessions in parallel, and show that for any zero-knowledge simulator S 
with appropriately bounded query-complexity, if x ¢ L, then S$ Vpara (x) cannot 
generate an accepting view of Vjara (£). 

Inspired by the work of Canetti, Kilian, Petrank and Rosen , we next 
present a modular construction of a concurrent adversarial verifier Vž ne whose 
purpose is to amplify query-complexity lower bounds of more basic verifiers. For 
example, consider V,j,,,, an adversarial verifier that is restricted to parallel com- 


position. Our modular construction would take Vira as input, and output an ad- 


versarial verifier Výne = Vebnc(Vpara) that, among other things, nests multiple 
incarnations of V* „a in a way that takes full advantage of the concurrent schedul- 


para 
ing. Under appropriate parameters, our analysis would conclude that for any zero- 
knowledge simulator S$ with polynomially bounded query-complexity, if x ¢ L, 


1 For example, to achieve precision 2, if the simulator S generates a view of V* and 
the running time of V* on that view is T, then the simulator S must have run in 
time 2T. 


516 K.-M. Chung, R. Pass, and W.-L. Dustin Tseng 


then S$Vcone(x) cannot generate an accepting view of Väne 
this is the key step for most zero-knowledge lower bounds). 

To demonstrate our framework, we re-prove the result of — a 
o(logn/loglogn) round-complexity lower bound for black-box concurrent zero- 
knowledge (the currently best known round-complexity lower bound); we believe 
the resulting analysis is quite clean. We also give a second lower bound concern- 


ing constant-round cZK protocols: 


(x) (recall again that 


Theorem (Informal). Let L be a non-trivial language, and let IT be a constant- 
round black-box concurrent zero-knowledge protocol with a potentially possibly 
super-polynomial time simulator. Then the simulator must run in time n? 008%), 


Incidentally, Pass and Venkitasubramaniam do construct constant-round 
black-box concurrent zero-knowledge protocols for all of NP in the model where 
both the simulator and the adversarial verifier runs in quasi-polynomial time 
nPoly (log n) . 

We also find our modular framework satisfying on a philosophical level: it 
serves as an framework in which lower bounds for restricted compositions of 
zero-knowledge (in this example parallel composition) can be transformed into 
lower bounds for zero-knowledge in the fully concurrent setting. A similar and 
celebrated example occurs in the work of Goldreich [Gol02], where it is shown 
that constructions of zero-knowledge protocols secure under parallel composition 
directly leads to constructions of concurrent zero-knowledge protocols secure in 
the timing model. 


2 Preliminaries 


We use N to denote the natural numbers {0, 1, . . .}, [n] to denote the set {1,...,n}, 
and |z| to denote the length of a string x € {0,1}. By ngl(n), we mean a function 
negligible in n (i.e., 1/n¥“). We assume familiarity with indistinguishability. 


Interactive Protocols. An interactive protocol I is a pair of interactive Turing 
machines, (P,V), where V is probabilistic polynomial time (PPT). P is called 
the prover, while V is called the verifier. (P, V} (x) denotes the random variable 
(over the randomness of P and V) representing V’s output at the end of the 
interaction on common input x. If additionally V receives auxiliary input z, we 
write (P(x), V (x, z)) to denote V’s output. We assume WLOG that J starts with 
a verifier message and ends with a prover message, and say IT has k rounds if the 
prover and verifier each sends k messages alternately. A full or partial transcript 
of IT is a sequence of alternating verifier and prover messages, (v1, pi,-..), where 
v denotes verifier messages and p denotes prover messages. 

We may compose an interactive proof in parallel. Let HE = (P*,V*) be the 
parallel composition of k copies of I; that is, each prover and verifier message 
in JT* is just concatenation of k independent copies of the corresponding message 
in M. Upon completion, V accepts if and only if all k sessions are accepted by 
V. We note that an adversarial verifier may choose to abort in one session but 
not another. 
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Zero Knowledge Protocols. In the setting of zero knowledge, we consider an 
adversarial verifier that attempts to “gain knowledge” by interacting with an 
honest prover. An adversarial verifier V* is a probabilistic polynomial time 
machine that, on common input x and auxiliary input z, interacts with the 
prover P. Let View? (x, z) be the random variable that denotes the view of V* 
in an interaction with P (this includes the random coins of V* and the messages 
received by V*). 

A black-box simulator S is a probabilistic polynomial time machine that 
is given black-box access to V* (written as SY"). Formally, S fixes the random 
coins r of V* a priori, and S$ is allowed to specify a valid partial transcript 
T =(v1,pi,---,pi) of V,*, and query V,* for the next verifier message v;+1. Here, 
T is valid if it is consistent with V,*, i.e., each verifier message vj in T is what 
Vž would have responded given the previous prover messages p1,...,pj;—1 and 
the fixed random tape r. Note that S is allowed to “rewind” V* by querying V* 
with different partial transcripts that shares a common prefix. 

Intuitively, an interactive proof is zero-knowledge (ZK) if the view of any 
adversarial verifier V* can be generated by a simulator. The formal definition 
follows. 


Definition 3 (Black-Box Zero-Knowledge [GO94]). Let H = 
(P,V) be an interactive proof (or argument) for a language L. II is black- 
box zero-knowledge if there exists a black-box simulator S such that for every 
common input x, auxiliary input z and every adversary V*, SV (@2)(z) runs 
in time polynomial in |x|, and the ensembles {View}. (2,2) }wen,2€{0,1}" and 


{SV (2) x)} .eL,2€{0,1}" are computationally indistinguishable as a function of 
|zļ. 


Other Primitives. In our construction of zero-knowledge arguments we use a few 
other primitives including Witness-Indistinguishable (WI) Proofs [FS90], Proofs 
of Knowledge (POK) [BGO02], and Special-Sound (SS) Proofs [CDS94]. 
Due to lack of space, we refer the readers to the full version of this paper for a 
more detailed description of these primitives. 


3 Construction 


We define a zero-knowledge argument PARALLELZK in Section [3.1| and show 
that it satisfies Theorem [I] in Section B.2] 


3.1 The Protocol 


Our ZK argument PARALLELZK (also used in [PV08][PTV10}) is a slight variant 
of the precise ZK protocol of [MP06], which in turn is a generalization of the 
Feige-Shamir protocol [FS89]. The protocol for language L € NP proceeds in 
three stages, given a security parameter n, a common input statement x € 
{0,1}", and a round-parameter m: 
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Stage Init: The verifier picks two random strings 71,r2 € {0,1}" and sends 
their images cı = f(r1), co = f(r2) through a one-way function f to the 
prover. The verifier then acts as the prover in m parallel instances of a 4- 
round witness indistinguishable and special sound proof of knowledge (WI 
and SS-POK) of the NP statement “cı or cg is in the image set of f” (a 
witness here would be a pre-image of cı or c2). All but the last two messages 
of each SS-POK is exchanged in this stage; we denote their partial transcripts 
by (Q1,Q2,...,Qm). 

Stage 1: m rounds of message exchanges occur in Stage 1. In the j*® round, 
the prover sends 8;, a random second last message of the jt SS-POK, and 
the verifier replies with the last message y; of the proof. These m rounds are 
called slots. Slot i is convincing if the verifier produces an accepting proof 
(i.e., the transcript (œi, Bi, Yi) is accepting). If there is ever an unconvincing 
slot, the prover aborts the whole session. 

Stage 2: The prover provides a 4-round witness indistinguishable proof of knowl- 
edge (WI-POK) of knowledge of the statement “x € L, or one of cı or c2 is 
in the image set of f”. 


Completeness and soundness follows directly from the proof of Feige and Shamir 
[FS89]; in fact, the protocol is an instantiation of theirs. Intuitively, to cheat in 
the protocol a prover must “know” an inverse to cı or c2 (because Stage 2 is an 
argument of knowledge), which requires the prover to invert the one-way function 
f (it is shown in that Stage Init and Stage 1 of the protocol cannot aid 
the prover in inverting f). A formal description of protocol ParallelZK is shown 
in Figure [i] 


Remark 4. We note that here we use multiple slots to improve the knowledge 
tightness of parallel zero knowledge, whereas previously, multiple slots was typi- 
cally used to achieve concurrent zero knowledge and w(logn) slots were consid- 
ered. In contrast, we show that in the context of parallel zero knowledge, using 
even constant number of slots improves the knowledge tightness significantly. In- 
deed, both our simulation technique and its analysis presented in the next section 
are new, where we rewind each slot to resolve all sessions in parallel (as opposed 
to previous works that focused on one session at a time). 


3.2 The Simulator 


To show that protocol IJ = PARALLELZK satisfies Theorem [I] given any poly- 
nomially bounded k = k(n), we need to construct a black-box zero-knowledge 
simulator S = Sp for protocol JI” (PARALLELZK repeated k times in parallel). 
On a very high-level, our simulator follows that of Feige and Shamir [FS90}: 
after fixing the SS-POK prefixes in Stage Init, the simulator rewinds one of the 
“slots” in Stage 1 (the last two messages of the SS-POKs). If the verifier responds 
with two convincing slots, the simulator uses the special-soundness property to 
extract a “fake witness” r such that f(r) = cı or c2, and uses this fake witness 
to simulate Stage 2 of the protocol. 
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Common Input: an instance x of a language L with witness relation Rz. 
Auxiliary Input for Prover: a witness w, such that (x, w) € Rz (£). 
Stage Init: 
V uniformly chooses ri1,r2 € {0,1}”. 
V > P: & = f(r1), c2 = fre). 
V © P: Exchange in parallel (interactively) all but the last two messages 
Q1,...,@m of m WI and SS-POKs on common input (c1,c2) with respect 
to the witness relation: 


Rr (c1, c2) = {r : f(r) = c1 or co} 


Note that V acts as the prover in these SS-POK’s. 
Stage 1: For j = 1 to m, exchange the i “slot” 
P —> V: The second last message 8; of the it? SS-POK. 
V — P: The last message y; of the it SS-POK. 
P aborts if (œi, 8:,%) is not a valid SS-POK. 
Stage 2: 
P & V: a 4-round computational-WI proof of knowledge from P to V on common 
input (c1,c2,x) with respect to the witness relation: 


Rrrr, c2, £) = {(r,w) :r € Rr (c1, c2) or w € Ri (x)} 


Fig. 1. PARALLELZK: a ZK argument for NP with round parameter m 


Given an adversarial verifier V* (for protocol I7*) and a common input x € 
{0,1}", the simulator SV” (x) does the following: 


1. The simulator S interacts with V*, following the honest prover strategy, 
until the end of Stage 1. We call this the reference simulation. 

2. The simulator S attempts to resolve all k parallel sessions in the reference 
simulation by extracting a fake witness r from the SS-POKs for each non- 
aborting session; aborted sessions are automatically considered resolved (and 
no fake witnesses are needed). To do so, S repeats the following step (called 
a rewinding pass) as many times as necessary, until all sessions are resolved. 

3. A rewinding pass. For each slot i, the simulator rewinds the reference 
simulation back to the beginning of slot i, sends V* a fresh random message 
Gi, and receives a new reply y; (of course this is done in parallel for all 
k sessions). Note that for each unresolved session j, S already knowns an 
accepting transcript (œi, Bi, yi) of SS-POK from the reference simulation. 
If session j does not abort during slot i in this rewinding pass, then S 
learns another accepting transcript (œ;, 6/,7;) of SS-POK. In this case, S 
can resolve the session 7 by extracting a fake witness using the special-sound 
property. 

4. S completes the reference simulation using extracted fake witnesses to sim- 
ulate the Stage 2 proof (only needed in each parallel session that did not 
abort). S outputs the view of V* on the reference simulation and this com- 
pletion. 
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For simplicity, we assume that for sessions that did not abort in the reference 
simulation, the extraction of fake witnesses always succeeds whenever S receives 
an accepting slot in a rewinding pass (i.e., we assume that S never sends the 
same value for 8 twice). This assumption can be made without loss of generality 
by the following modifications of the simulation strategy. 


— Let the simulator S performs at most 2” rewinding passes. If there exist 
any unsolved sessions j after 2” rewinding passes, S resolves the session by 
brute force, i.e., by directly inverting the one-way function f to obtain a fake 
witness of length n. This modification increases the running time (but not 
the number of queries) of S by at most a poly(n) factor (multiplicatively), 
and makes sure that S makes at most poly(2”) queries to V*. 

— Let the final verifier challenge in the SS-POK have length |8| = n?. In this 
case, the probability of S ever querying V* with the same value of ( twice 
is poly(2”) - 277” = 2792n?) definitely negligible in n. 


We now show two lemmas regarding S that together show that PARALLELZK is 
zero-knowledge when composed in parallel. 


Lemma 5. S runs in expected polynomial time, and makes O(mk'/™ log? n) 
queries in expectation. 


Lemma 6. On common input x € L, the output of S is indistinguishable from 
the real view of V*. 


We give a sketch of proof of Lemmal6]first, and then prove Lemmaf5]by bounding 
the expected number of rewinding passes before S$ extracts all necessary fake 
witnesses. 


Proof (Proof Sketch of Lemma[@). The output of S up to the end of Stage 1 
(i.e., the reference simulation) is identical to the view of V*, because S' follows 
the honest prover strategy. The output of S in Stage 2 of the protocol is com- 
putationally indistinguishable from the view of V* because the Stage 2 proof is 
witness indistinguishable. Formally, this can be shown with a hybrid argument 
where we incrementally exchange each of the k parallel Stage 2 proofs from using 
“fake witnesses” r such that f(r) = cı or cə (the simulator strategy), to a real 
witnesses w for x € L (the honest prover strategy). 


Proof of Lemma[i} We proceed to prove Lemma [5] by bounding the expected 
number of rewinding passes in an execution of S. Let R be a random variable 
that denotes the number of rewinding passes. We will show that: 


E[R] = E[# rewinding passes | < O(k!/™ - log? n). 


This then implies Lemma[5] because outside of rewinding passes, S”*(a) makes 
only O(m) queries to V* and runs in polynomial time. 

Before presenting our analysis for the general case of m slots, we revisit the 
classical analysis for the case of single slot for intuition. 
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The case of single slot. The analysis is very simple. For every j € [k], let R; 
denote the number of rewinding passes to resolve session j, and let p; be the 
probability that session j does not abort during the single slot. Recall that 
session j is resolved if it aborts in the reference simulation, and otherwise, the 
simulator needs to rewind the slot several times until session j does not abort 
again. Hence, the expected number of rewinding passes to resolve session j is 
[Rj] = (— py) -0+7j-— =1. 
Pj 


By linearity of expectation, the expected number of rewinding passes is 


[R] = 5° E[R;] = k < O(k - log? n). 


We note that the above simple analysis is tight. Consider the case where during 
the slot, each session aborts independently with probability (1 — 1/k). It is not 
hard to see that in this case, with constant probability, at least one session 
does not abort during the slot, and the simulator needs to rewind k times in 
expectation to resolve the survival session. Therefore, the expected number of 
rewinding passes is 2(k). 

In fact, it is instructive to note that the following natural generalization of the 
above example is essentially the worse-case example for the general case of m 
slots: during each slot į € [m], each survival session j aborts independently with 
probability (1 — k~!/™). In this case, each session does not abort during the m 
slots with probability (k~!/”)™ = 1/k, and hence with constant probability, at 
least one session survives after m slots. Resolving the survival session requires 
k!/™ /m rewinding passes in expectation, and hence the expected number of 
rewinding passes is Q(k!/"™/m). 

We note that although in the above example, each session aborts during each 
slot independently, in general, the aborting probability of each session at each 
slot can depends arbitrarily on the history and correlated arbitrarily. 


The general case of m slots. To analyze the expected number of rewinding 
passes, we define the following [0, 1]-valued random variables based on the refer- 
ence simulation generated in Step 1. Let h; denote the partial transcript of the 
reference simulation before slot i. For every slot i € [m] and session j € [k], we 
define random variable p; j as follows. 


— If session j is already aborted at the end of slot 7, then we define p;,j £l. 
— Otherwise, we define p; j to be the conditional probability 


Pij = Pr[ session j does not abort during slot i | hi]. 


For intuition, p; j is essentially the probability that S can resolve session j by 
rewinding slot 7. Now consider the best slot for each session — the slot with the 
highest p; į value (this is the slot that S wants to rewind). We record this value 
as 

pj = MAX Di,j 
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Note that for a session j that aborts in the reference simulation, we have pj = 1, 
indicating that sessions j is already resolved and matching the above intuition. 
Finally, the number of rewinding passes depends heavily on the worst session — 
the session with the worst pj value (the “worst best slot”). We record this value 
as the critical probability: 

p= min p}. 


To see how the critical probability p* plays an important role in the expected 
number of rewinding passes, note that on one hand, S needs roughly 1/p* rewind- 
ing passes to resolve the worse-case session; on the other hand, the chance of 
having a reference simulation with small critical probability (say, p* < p) is rare 
(at most p™). Therefore, to upper bound E/R], we define the following events, 
which partition the probability space according to the critical probability. For 


every t € N, let 
def 1 
Qt = (m) 


— Let Ao be the event that p* > ag = k7!/™, and for every t € N, let A; be 
the event that 


Qat S D; < Q4-1. 
Similarly for every session j € [k], 


— Let Ao,j be the event that p} > ao = k-1/™ and for every t € N, let At j 
be the event that 
at < P} < r1. 


We can now express the expectation of the number of rewinding passes as follows. 


[FR] = X` Pr[A:] ; E[R | Ai] 


t>0 


k 
< Pr[Ao]-E[R| Ao] + XC | XO Pr[A;] | -EIR | A4, 


t>1 \j= 


where the last inequality follows by A; C Uj Az; (which follows from definition). 
We proceed to bound each term. For Ao, we use trivial bound Pr[Ao] < 1. For 
general t > 1 and every j € |k], we first observe that when Az ; happens, session 
j does not abort all of its m slots in the reference simulation (since otherwise, 
pj = 1). This happened despite the fact that each slot ¿i in session j in the 
reference simulation could have only survived (not aborted) with probability 
Pij < a4-1. Thus, 


a 1 Y 1 
i (sa) = Ee 


and, 
k 
1 1 
X Pr[At,s] <k. 2m(t-1). k F Qm(t—1)* 


j=l 
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It remains to bound E[R | A+], which is given in the follow lemma. 


Lemma 7. For every t > 0, we have 


[R| Ai] <O (2! -k/™ . Jog? n) . 


We apply Lemma [7] to upper bound E[R] first. 


LLR] 


IA 


; 1 : 
IR | Aol + >) ogc: [R | Ai] 
t>1 


<O (RY log? n) +o ey OP (RY . log? n) 
t>1 
<0 ae - log? n) : 
This completes the proof of Lemma] 


Proof (Proof of Lemma{[. The event A; means that in the reference simulation, 
for every non-aborting session j, there exists a useful slot 7 € [m] such that 


Pr[ session j is not aborted after slot i | hi] = pi; > ae. 
Therefore, in each rewinding pass, the simulator S may learn an (additional) 
accepting transcript of SS-POK in session j with probability at least a,, allowing 


it to extract a fake witness. 
Fix a non-aborting session j, and define 


10 - log? 
q= (=) = 0 (2. kY™ Jog? n) 


at 
Because the rewinding passes are independent, we have 
Pr|session j is resolved after q rewinding passes] = 1 — (1 — œ+)! > 1 — ngl(n). 
Since there are at most k survival sessions, by the union bound, 
Pr{all sessions are resolved after q rewinding passes] > 1 — ngl(n). 


In other words, every q rewinding passes can solve all the sessions with proba- 
bility at least 1 — ngl(n). It follows that 


E[R | Ar] < (1 — ngl(n)) - q + ngl(n) (1 — ngl(n)) : 2q +++- 
< O(q) =O (2 fh . log? n) i 
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4 Lower Bound 


The proof of Theorem[]follows a well-known framework (e.g., [GK96b|/CKPRO1]). 
Let S be a black-box zero-knowledge simulator for I7* = (P*,V*) that makes 
less than q = O(k'/™/log? n) queries, and let V** be a particular adversarial 
verifier to be specified later. We define D, a BPP decision procedure for L by 
combining S$ and V**: on input instance x, D(x) accepts if and only if sv" (x) 
outputs an accepting view of V** (i.e., all k sessions of V** accept). Using the 
zero-knowledge property, it is easy to show (see for example [GK96b]) that if 
the modified protocol IT** = (P*,V**) is complete for L (based on our choice 
of V**), then D is complete for L as well. The main effort of the proof is to 
show that D is sound; this relies both on the choice of V** and the fact that S 
makes less than q queries to V**. We discuss our choice of V** in Section I] 
and analyze the soundness of D in Section [4.2] 


4.1 The Random Termination Verifier V** 


In this section, we define a verifier V¥* for the parallelized protocol with two 
goals in mind: the protocol M** = (P*,V**) should be complete (so that D 
is complete), and V** should be sound against any rewinding simulator S that 
makes less than q queries to V** (so that D is sound). 

Just as [CKPROI], we define V** to follow the honest verifier strategy V* 
with one extra property: random termination] Whenever the prover P* or the 
rewinding simulator S makes a query to V**, V** determines, with independent 
and fresh randomness|}| whether or not to terminate immediately and accept with 
probability p € [0, 1], a parameter to be specified later; this is done independently 
for each of the k parallel sessions (i.e., one session may be terminated while other 
sessions continue). Due to this independence among parallel sessions, we often 
treat V** as k machines, (V;*,...,V;*), each responsible for making the decision 
to terminate and generating the verifier messages for one session. Note that 
the fresh randomness is only used to decide whether to terminate or not; V** 
generates protocol messages using its default random tape that is kept the same 
between rewinds (as expected by following the honest verifier strategy). 

Clearly, 17** = (P*,V**) is still complete. It remains to show that V** is 
“sound” against the rewinding S; that is, on input z ¢ L, S v™ is unlikely to 


2 The term “random termination” was first used by Haitner [Hai09], but the random 
termination verifier we considered already appeared in the earlier work of [CKPRO]). 

3 We use a well-known technique (see for example [GK96b][CKPRO]]) to generate fresh 
independent randomness on the fly for each query from the simulator S, despite the 
fact that S may rewind V** between queries and force V** to use the same random 
tape. Let H be a family of g-wise independent hash-functions, and let V** sample 
one hash-function h + H in the very beginning. Then whenever V** receives a query 
(from P* or S), V** applies h to the current protocol transcript (the sequence of 
messages exchanged in the protocol so far) and use the output as a fresh random 
tape. Since S makes at most q queries to V**, the output distribution of the hash- 
function is truly uniformly random. 
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generate an accepting transcript of V**. From now on we drop the common 
input x ¢ L. Intuitively, by randomly terminating, V’* can better protect: its 
randomness against S’s rewinds (when V** terminates, 9 learns nothing about 
V**’s fixed random tape), thus ensuring soundness. To make this intuition more 
concrete, suppose for example that S made q queries 7),...,7, to V**, and 
without loss of generality outputs the view of V** on a subset of size m of those 
queried, T = {7i,,---,7:,,}- Further suppose that there exists a parallel session 
j € [k] such that V** does not terminate on the queries in T, but terminates on 
all remaining queries. Then intuitively, S’s rewinding does not help S$ convince 
V* in session j, and the soundness of the original protocol JJ should imply that 
V** rejects with overwhelming probability in session j (and therefore rejects 


overall). 
The core of our proof is to show that, with high probability, for every subset 
of size m of queries T = {7;,,...,7;,, } made by S, there exists a session j € [k] 


with overwhelming probability such that rewinds are “not helpful” for session 7 
with respect to T in the above manner. We make this possible by setting the 
termination probability to p = (1 — 1/q). 

We now state the formal lemmas. Let n be the security parameter and L be 


logn 
log logn 


a language. Suppose there exists a m(n) € o( )-round argument JT = 


(P,V) for L with perfect completeness and negligible soundness error. For any 
polynomially bounded k(n) > n, let S be a black-box zero-knowledge simulator 
of the parallelized protocol I7* = (P*,V") that makes at most 


q =k" (log? n) 


queries, and let V** be a random termination verifier of the parallelized protocol 
with termination probability 


p= (1-2) = (1- ap 002m). 


(These parameters passes the following sanity checks: q is polynomially bounded 
and q > m — the simulator queries V’* at least once for each round of the 
protocol. It is also useful later to know that (4) < q™ < k.) Then: 


Lemma 8. On input x E€ L, D(x) accepts with probability 1, i.e., Sv" (x) out- 
puts an accepting view of V** with probability 1 — ngl(n). 


Lemma 9. On input « ¢ L, the probability that ov (x) generates an accepting 
view of V** is negligible, i.e., D has negligible soundness error. 


We sketch the proof of Lemma [B] now, and give the proof of Lemma [9] in the 
next section. 


4 Without loss of generality, we may assume that before S outputs a view of V**, 
S first queries V** with the messages in the view (if S hasn’t already). This may 
increase the number of queries by m, and thus weaken the resulting lower bound 
from q to g—m. Nevertheless, this does not change our lower bound since q = w(m) 
in Theorem 2] 
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Proof (Proof Sketch). Using the zero-knowledge property, the output of S is 
indistinguishable from the view of V** in an execution with P*. Therefore it is 
enough to show that (PF, ye) (x) accepts with probability 1. In each parallel 
session j € [k], V;* accepts by definition if it decides to terminate in some protocol 
round. Otherwise, Vj» is identical to V and would still accept with probability 1 
because the original protocol IT = (P, V) has perfect completeness. 


4.2 Soundness of D 


Proof (Proof of Lemma[Y). We prove Lemmal9]with a reduction. Suppose for the 
sake of contradiction that S convinces V** on some input x ¢ L with probability 
more than 1/p(n) for some polynomial p. Using S, we construct a cheating prover 
P* for the original protocol ZT = (P,V) that convinces V with non-negligible 
probability. 

Before we start, assume without loss of generality that S makes exactly q 
queries, and that before S outputs a view of V**, S would first query V** on 
all previous messages in the view. For technical convenience, we let V** make 
a fresh decision to terminate for each query and each session, even if V** has 
already terminated previously in the same session. I.e., regardless of history or 
message content, for each query and each parallel session, V** always terminates 
independently with probability p. 

Our P* is a natural extension of the classic reduction of — P* 
guesses a session jo € [k] and m indices To = {i1,...,im} C [q] uniformly at 
random, and interacts with an outside honest V by internally simulating an 
interaction of ($,V**) with V embedded in session jo, queries 7;,,...,7i,, of 
V**. In comparison, the idea of guessing a random query subset is exactly as 
in [GK96b]. The difference is that the reduction in is for single session 
protocols, and in contrast, we reduce from parallel protocols to single session 
protocols. Hence, our reduction P* guesses a random session as well. 

In more details, P* runs S and V** internally. It simulates k — 1 sessions of 
V** honestly (except Vir). When simulating Vý for the it! query where S queries 
Ti, P* first simulates (with fresh randomness) V;*’s decision on termination. If 
Vý decides to terminate but i € To or if Vý does not terminate but i ¢ To, P* 
aborts (in both these cases, the termination decision of V;* is incompatible with 
P*’s choice of queries to forward). If the forwarded queries (index set To) are not 
“consistent” (e.g., if they query for the same round of the protocol more than 
once, or the query contains inconsistent transcript), P* aborts as well. Note that 
if P* does not abort, then V** is perfectly simulated (even in session jo). 

Now consider the following best case scenario. Suppose that at the end of the 
simulation, S successfully outputs an accepting view of V**. Moreover, suppose 
that the accepting view consists exactly of the queries in index set Tọ (this au- 
tomatically guarantees that the forwarded queries are consistent), and suppose 
that P* does not abort (i.e., termination decisions are compatible with the for- 
warded queries). Then, P* will have successfully convinced the outside honest 
V. The rest of the proof is devoted to show that this best case scenario occurs 
with noticeable probability (roughly 1/(p- k?)). 
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Let T C fq] denote an index set {i1,...,im} of size m. For an index set 
T C |q] and a session j € [k], we define A(T, j) to be the event that, on session 
j, V** terminates session j on query 7; iff i ¢ T. Referring back to our intuition 
earlier, A(T, j) denotes the event that for session j, S’s rewinds are not helpful 
with respect to the queries indexed by T. If event A(T, 7) holds, and S uses the 
queries indexed by T to form an accepting view of V**, and P* guesses both 
To = T and jo = j in the beginning, then P* will have successfully convinced 
the outside honest V. 

We claim that by the setting of parameters, we have 


Pr|VT c [gq], 57 € [k] s.t. A(T, 7)] > 1 — ngl(n) (1) 


where ngl(n) denotes a negligible quantity in n. In words, with overwhelming 
probability, for every possible index set T of size m that S may use to output a 
view of V**, there exists a session j such that $’s rewinds are not helpful with 
respect to the queries indexed by T. 

Before proving (I), we first use the claim to show that P* convinces V with 
noticeable probability. Recall that S outputs an accepting view of V** with 
probability 1/p. By a union bound, we have 


Pr[(S outputs accepting view of V**) A (VT C [q], 3j € [k] s-t. A(T, 7))] 
> (1/p) — ngl(n). 


Note that when the above event holds, there exist a unique index T of m queries 
used by S to form an accepting view of V**, and there exists a session J € [k] 
such that A(Î,ĵ) holds. As mentioned earlier, if P* guesses jo = j and Ty = 
T correctly, P* will have successfully convinced V. Since P* guesses j and T 
uniformly at random and independent of the interaction between S and V**, we 
have 


Pr[P* convinces V] 
> Pr|(S convinces V*™*) A (YT C [q], 3j € [k] s-t. A(T, 7)) 


A^ (P* guesses T and j correctly)] 


5 (/p=nel(n)) 1 
DE ee 

where in the last line we used (7) < g™ < k. This contradicts to the fact that 
IT has negligible soundness error and completes our analysis. 

It remains to show (I). By definition, each session j terminates on each query 
Ti with probability exactly p, independent from any other session or query. Hence, 
for any session j and index set T of size m, the probability that event A(T, 7) 
holds is 


PAT j) =o" 1-a)" > (1 - ~) (=) > o ($ lox?” n)). 


q q 
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It follows that 


k 
Pr[aj € [k] s.t. A(T, j) > 1- (1 -N G - (log?” n)) ) > 1 — e7 2log? n), 


Finally, by a union bound, we have 


Pr\VT C [q], 3j € [k] s.t. A(T, j) > 1— e—Mllog?” n), a > 1-—ngl(n), 
m 


as claimed. 


As with most lower bounds for black-box zero-knowledge, a careful reading re- 
veals that Theorem P]also applies to more liberal definitions of zero-knowledge, 
such as ¢-zero-knowledge and zero-knowledge with expected polynomial time 
simulators. Additionally, note that the proof of Lemma [9] never assume that 
S is a zero-knowledge simulator, and works just as well for any PPT oracle 
machine S. 


Remark 10. By examining the technical inner workings of the proof of Canetti, 
Kilian, Petrank and Rosen (which also uses a random termination 
verifier), we discovered that part of their analysis implicitly presents a lower 
bound for the number of queries made by black-box simulators for parallel zero- 
knowledge protocols. Compared with Theorem [A] and our analysis, the result of 
establishes a weaker bound (and is arguably more complicated); this 
is not surprising, since establishing a parallel lower bound was not their goal. 

Specifically, implicitly establishes a log’) (k) lower bound on the 
number of simulator queries, whereas we were able to establish a lower bound of 
k1/™ /(log? n). Nevertheless, we believe that by adapting our parameters (which 
may seem strange for their setting), their analysis could be strengthened to match 
our lower bounds (we have not verified all the details, however). 


Acknowledgments. We thank to Iftach Haitner and Johan Hastad for useful 
discussion in the early stage of this research. 
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Abstract. In this work, we study simultaneously resettable arguments 
of knowledge. As our main result, we show a construction of a constant- 
round simultaneously resettable witness-indistinguishable argument of 
knowledge (simresWZAoK, for short) for any NP language. We also 
show two applications of simresVWVZAokK: the first constant-round simul- 
taneously resettable zero-knowledge argument of knowledge in the Bare 
Public-Key Model; and the first simultaneously resettable identification 
scheme which follows the knowledge extraction paradigm. 


1 Introduction 


Interaction and private randomness are the two fundamental ingredients in Cryp- 
tography. They are especially important for achieving zero-knowledge proofs [I5]. 
In [7] Canetti, Goldreich, Goldwasser and Micali showed that when private ran- 
domness is limited and re-used in multiple instances of a proof system, it is still 
possible to preserve the zero-knowledge requirement. The setting proposed by 
is of a malicious verifier that resets the prover, therefore forcing the prover to 
run several protocol executions using the same randomness. This setting applies 
to protocols where the prover is implemented by a stateless device. Therefore, 
a prover can only count on the limited (hardwired) randomness while it can 
be adaptively reset any polynomial number of times. The resulting security no- 
tion against such powerful verifiers is referred to as resettable zero knowledge 
(rZK) and is provably harder to achieve than concurrent zero knowledge [I8]. 
Feasibility results have been achieved in [7[I7] in the standard model with the 
following round-complexity: polylogarithmic for rZK and constant for resettable 
witness indistinguishability (r/VZ, in short). Since then, it was also shown how 
to achieve resettable zero knowledge in the Bare Public-Key (BPK) model, in- 
troduced by Canetti et al. [7], where one can obtain better round complexity and 
assumptions [902221]. Very recently, it has been shown [13] that resettable 
statistical zero knowledge for non-trivial languages is possible. 
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The “reverse” of the above question has been considered by Barak, Goldreich, 
Goldwasser and Lindell [4] where a malicious prover resets a verifier, called 
resettable soundness. In [4], it has been shown how to obtain resettable soundness 
along with ZX in a constant number of rounds. 

Barak et al. [4] proposed the challenging simultaneous resettability conjecture, 
where one would like to prove that a protocol is secure against both a reset- 
ting malicious prover and a resetting malicious verifier. The existing machinery 
turned out to be insufficient, and a definitive answer required almost a decade. 
In the work of Deng, Goyal and Sahai [9] they showed a resettably sound rZK 
argument for NP with polynomial round complexity. Very recently, results in 
the BPK model for simultaneous resettability have been obtained in with 
a constant number of rounds. 


Arguments of knowledge under simultaneous resettability. Argument systems are 
often used with a different goal than proving membership of an instance in a 
language. Indeed, it is commonly required to prove knowledge (possession) of a 
witness instead of the truthfulness of a statement. Since arguments of knowledge 
serve as major building blocks in Cryptography (e.g., in identification schemedih 
it is an interesting question whether the previous results for arguments of mem- 
bership extend to arguments of knowledge. Unfortunately, arguments of knowl- 
edge have been achieved so far only when one party can reset. That is, we have 
rZK arguments of knowledge [7] and, separately, resettably sound ZK arguments 
of knowledge [4]. Instead, when reset attacks are possible in both directions, no 
result is known even when only rWZ with resettable argument of knowledge is 
desired. 

It is important to note that resettable security for ZAPs comes almost for 
free because of the minimal round complexity (1 or 2 rounds). However, it is 
not known how to accommodate for knowledge extraction, unless one relies on 
non-standard (e.g., non-falsifiable) assumptions. For the case of resettably sound 
rZK, all the above results [9[8J2] critically use an instance-dependent technique 
along with ZAPs: when the statement is true (ie., when proving rZK), the 
prover/simulator can run ZAPs which allow the use of multiple witnesses. Such 
use of multiple witnesses gives some flexibility that turns out to be very useful 
to prove resettable zero knowledge. Instead, when the statement is false, the 
protocols are designed so that adversarial malicious prover must stick with some 
fixed messages during the execution of protocol. Therefore, rewinding capabil- 
ities do not help the resetting malicious prover since he can not change those 
fixed messages. This is critically used in the proofs of resettable soundness in or- 
der to reach a contradiction when a prover proves a false statement. It is easy to 
see that the above approach fails when arguments of knowledge are considered. 
Indeed, when the malicious resetting prover proves a true statement, the same 
freedom that allows one to prove rZK/rWT, also gives extra power to the mali- 
cious prover. Consequently, designing an extractor appears problematic and new 


> 


1 Bellare et al. in [5] gave various definitions for identification schemes when the ad- 
versary can also reset the proving device. 
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techniques seem to be needed so that the simultaneous resettability conjecture 
is resolved even when we consider knowledge extraction. 


Our results. Our main result is the first construction of a constant-round simul- 
taneously resettable witness-indistinguishable argument of knowledgd4 (in short, 
simresWZAok) for any NP language. Our protocol is based on the novel use of 
ZAPs and resettably sound zero-knowledge arguments, which improves over the 
techniques previously used in [9J8] as well as concurrent and independent work 
of [I6]. 

We show several applications of our main result. First, we show that by com- 
bining two executions of our protocol for simresWVZAoK, we obtain a constant- 
round simultaneously resettable zero-knowledge argument of knowledge in the 
BPK model. This improves the results of |8[2] which do not enjoy witness ex- 
traction with respect to adversarial resetting provers. 

As another application of our main protocol, we also consider the question of 
secure identification under simultaneous resettability and show how to use the 
above simresWZAoK to obtain the first simultaneously resettable identification 
scheme which follows the knowledge extraction paradigm. We describe it by 
extending the work of Bellare, et al. [5]. 

In addition, in the full version of this paper, we show how to obtain a constant- 
round resettably sound concurrent zero knowledge argument of knowledge in the 
BPK model by relying on collision-resistant hash functions only (CRHFs, for 
short) (i.e., we do not require ZAPs, and thus trapdoor permutations). 


Notation. We denote by n € N the security parameter and by PPT the property 
of an algorithm of running in probabilistic polynomial-time. A function € is 
negligible in n (or just negligible) if for every polynomial p(-) there exists a 
value no € N such that for all n > no it holds that e(n) < 1/p(n). We denote 
by x — D the sampling of an element x from the distribution D. We also 
use  < A to indicate that the element x is sampled from set A according to 
the uniform distribution. Let P,V be interactive Turing machines, we denote 
by (P(-), V(-))(#) the random variable representing the local output of V when 
interacting with P where x is the common input and the randomness of each 
machine is uniformly and independently chosen. 


Blum’s protocol. We will use the 3-round WZPoK protocol of Blum [6] for the 
NP-complete language Graph Hamiltonicity (HC) as main ingredient of our 
construction. We refer to Blum’s protocol as BL and to BL1, BL2, BL3 its three 
rounds. 


2 In this work, we will never consider the case of resettable soundness along with 
non-resettable argument of knowledge. Therefore, each time we mention together 
resettable soundness and argument of knowledge, we mean that both soundness and 
witness extraction hold against a malicious resetting prover. 

In a very recent and independent work [16], Goyal and Maji achieved simultaneously 
resettable secure computation. Their work achieves (with simulation-based security) 
simultaneous resettability with polynomial round complexity assuming also the ex- 
istence of lossy trapdoor encryption. 
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2 Resettably Sound rWZ Arguments of Knowledge 


Our goal is to obtain a construction that is resettably-sound resettable WZ and 
a resettable argument of knowledge in a constant number of rounds. The only 
known constant-round simultaneously-resettable WZ protocol is rZAP which is 
not an argument of knowledge and as discussed previously there is not much 
hope to transform it in an argument of knowledge (even without considering 
resettability). 


A typical paradigm: determining message and consistency proof. Typically, pro- 
tocols dealing with a resetting adversary (([7J49|) rely on the following paradigm: 
the resetting party is required to provide a special message (called determining 
message) that determines her own action for the rest of the protocol. Namely, 
for each protocol message the resetting party is required to prove that such mes- 
sage is consistent with the determining message (we call this proof a consistency 
proof). Moreover, the actual randomness used by the honest party in the pro- 
tocol depends on the determining message (typically the honest party applies 
a pseudorandom function (PRF) on it). The combination of the randomness 
depending on the determining message and the consistency proof given by the 
resetting party, suppresses the resetting power of the adversary. Indeed, due to 
the consistency proof, the resetting party can not change a message previously 
played without first having changed the determining message (unless she is able 
to fake the consistency proof). However, if she changes the determining message, 
then the honest party plays the protocol with (computationally) fresh random- 
ness (unless the pseudo-randomness of the PRF is violated). We will follow this 
paradigm to construct our simultaneously resettable witness indistinguishable 
argument of knowledge as well. Recall that as specified above, we do not know 
how to from rZAPs that are already simultaneously resettable and try to trans- 
form them in arguments of knowledge. Our starting point is Blum’s proof of 
knowledge [6]. In the following discussion we show incrementally how to trans- 
form such protocol to enjoy resettable witness indistinguishability and resettable 
soundness (this transformation is already known in literature) to finally present 
our novel technique to obtain also resettable argument of knowledge. 


Resettable WT and stand-alone argument of knowledge [4y. When the verifier 
can reset the prover, following the above paradigm, it is easy to construct a 
resettable WZ system starting from Blum’s protocol. In Blum’s protocol the 
only message from V to P is the challenge. The modified resettable version 
requires that V sends a statistically binding commitment of the challenge as 
determining message. The only other protocol message of V is the opening of 
the commitment which, due to the binding property, is itself a proof that the 
message is consistent with the determining message. Note that such modified 
protocol is no longer an argument of knowledge since the extractor has the same 
power of the malicious verifier. In order to allow only the extractor to cheat, the 
next step is to avoid the opening as a proof of consistency. Instead of the actual 
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opening of the commitment, V is required to send the challenge along with a res- 
sound (non-black-box) ZK argument ([3]). The (non-black box) extractor can 
send an arbitrary challenge and prove consistency with the determining message 
by using the (stand-alone) non-black-box simulator (recall that only V might 
reset here). The resulting protocol is resettable WZ and (stand-alone) argument 
of knowledge (rWZAoK for short) and it is known from [4]. 

We use a modified version of such protocol. We require that the commitment 
sent by the verifier is statistically hiding (instead of statistically binding), and 
we use the statistical zero-knowledge argument of knowledge of [20]. 


Achieving Resettable Soundness and Resettable Argument of Knowledge: existent 
solutions do not work. We now deal with the case in which also the prover can 
reset. By the BGGL compiler [4], we know that any constant-round public- 
coin WZ argument system can be upgraded to resettable soundness by simply 
requiring the honest verifier to apply a PRF on the first message received from 
the prover. However, since our aim is to obtain simultaneous resettability, we 
need to start from the rWZAoK protocol shown before, which is not public coin. 
Thus, following the paradigm and the technique of [9], we require that as first 
message, P sends the commitment of the randomness that will be used in the 
protocol: this is the determining message. Then upon each protocol message P 
proves that the message is honestly computed using the randomness committed 
in the determining message: this is the consistency proof. Since we are now in 
the setting in which both parties can reset each other the consistency proof must 
be provided with a simultaneous resettable tool. For this purpose we use rZAPs 
that are constant-round simultaneously resettable WZT proofs. We denote the 
theorem to be proved with rZAP as “consistency theorem”, since P proves that 
a message is honestly computed and consistent with the randomness committed 
in the determining message. 

The technical problem using rZAPs is that since guarantee WZ, the theorem 
being proved is required to have more than one witness (note that the simul- 
taneously resettable protocol of [9] can not be used here since we aim to a 
constant-round construction). Recall that we want to use rZAP to provide the 
proof of consistency with the determining message. If the determining message is 
a statistically binding commitment of the randomness, then there exists a unique 
opening, which implies the existence of only one witness. On the other hand, if 
we use a Statistically hiding commitment, then any opening is a legitimate wit- 
ness, the theorem is always true and the benefit of the determining message 
vanishes. The solution to overcome this problem is to change the theorem to be 
proved with rZAP so that it admits more than one witness. 

In [9] the consistency theorem is augmented with the theorem “x € L” that 
we call “trapdoor theorem” recalling FLS paradigm [I2] but with a different pur- 
pose. We call it trapdoor to stress out that it is an escape for the prover that 
can pass the consistency proof essentially having freedom to change messages 
among resets. Hence in [9J8], along with each protocol message, P is required to 
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prove that either the protocol message is computed honestly with the randomness 
committed in the determining message, (i.e., the “consistency theorem”) or z € L 
(i.e., the “trapdoor theorem”). 

This solution can be seen as an instance-dependent technique. Indeed, it is 
easy to see that a malicious prover can play messages inconsistently with the 
determining message and still pass the consistency check, therefore exploiting its 
resetting power, only when x € L. Instead, when proving soundness, since x ¢ L, 
the trapdoor theorem is false, hence due to soundness of rZAPs, the malicious 
prover is forced to play according to the determining message therefore honestly 
following the protocol specifications. 

Unfortunately, such an instance-dependent solution suffices to prove resettable 
soundness but fails completely when one would like to prove witness extraction 
(i.e., the argument of knowledge property). The reason is that, when proving 
witness extraction, we have to construct an extractor that works against any 
malicious prover, even one who uses the witness of the trapdoor theorem when 
proving consistency of the protocol messages. This possible behavior harms the 
extractor in two ways (recall that the witness can be computed from two distinct 
transcripts of Blum’s protocol that have the same first message): 1) upon seeing 
the challenge of the verifier/extractor, P resets it and changes the first message of 
Blum’s protocol according to the challenge; 2) P acts as a resetting verifier in the 
non-black-box ZK protocol, therefore preventing the extractor to use the stand- 
alone non-black-box simulator. Even though this is not harmful for the soundness 
property (a malicious prover can perform this attack only when x € L), this 
attack kills the existence of the extractor. Therefore the above construction is 
only resettable WZ and resettable sound. Concluding, the instance-dependent 
technique of [9] inherently prevents the existence of any extractor. New ideas are 
required to solve the problem. 


Achieving Resettable Argument of Knowledge: the new technique. We propose 
a new “trapdoor” theorem that forces the resetting prover to honestly follow the 
protocol regardless of whether x € L or not. 

The idea is the following. We require P to run two parallel executions of the 
rWTZAoK shown above, that we denote as subprotocols 79,71. In the determining 
message, in addition to the commitment of the random tape that will be used 
to run each sub-protocol, we require that P commits to a single bit. Then, the 
trapdoor theorem in sub-protocol ma will be the following: “d is the bit committed 
in the determining message”. Since in the determining message there is only one 
bit committed (the other two are commitments of random tapes), due to the 
statistical binding property of the commitment, the trapdoor theorem is true in 
only one sub-protocol. Hence, in at least one of the sub-protocols the trapdoor 
theorem is false regardless of whether x € L or not, and in such sub-protocol P 
is forced to honestly follow the rWZAoK protocol, playing consistently with the 
determining message. 

More specifically, the final protocol goes as follows. P first sends the deter- 
mining message which consists of the statistically binding commitment of the 
random tapes that will be used in each sub-protocol and of a single bit. Each 
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sub-protocol is augmented with rZAPs sent by P to V in which P proves consis- 
tency with the determining message. Therefore, in each sub-protocol ma, along 
with each message of the rWZAoK protocol, P provides a rZAP for the follow- 
ing compound theorem: either the message is honestly computed and consistent 
with the determining message, or d is the bit committed in the determining mes- 
sage. Finally, the verifier will accept the proof if and only if both sub-protocol 
executions are accepting. 

It is easy to see that any malicious prover can not escape from following the 
determining message in at least one of the subprotocols. Indeed, let b be the 
bit committed in the determining message. If on one hand, in sub-protocol 7», 
a malicious P is not forced to be honest and can then use the resetting power 
to prove any false theorem (indeed among resets P can change the protocol 
messages without changing the determining message), on the other hand, in 
sub-protocol mz, the trapdoor theorem is false, thus the only way to provide 
an accepting rZAP is to follow the honest behavior playing messages derived 
from the determining message. Therefore, in sub-protocol mg, the extractor is 
guaranteed that 1) for sessions starting with the same determining message, the 
first round of Blum’s protocol does not change, so that playing with two distinct 
challenges yields the extraction of the witness; 2) the extractor can run the 
stand-alone non-black-box ZK simulator without being detected. Hence we have 
the following: sub-protocol mg is resettably-sound and resettable argument of 
knowledge, while sub-protocol m, is not sound. Note that in both sub-protocols, 
the resettable WT property is still preserved. 


2.1 Formal Construction of simresWZAoK 


We formally describe how to build a constant-round simultaneously resettable 
WT AoK (simresWZAok) starting from Blum’s protocol (BL protocol). We de- 
note by SHCom, a two-round statistically hiding commitment scheme. We denote 
by SBCom the commitment procedure of a non-interactive statistically binding 
commitment scheme. We denote by c ~— SBCom(v, s) (resp. SHCom) the output 
of the commitment of the value v computed with randomness s. We use the 
resettably-sound statistical (non-black-box) ZK AoK of [20] that we denote by 
resSZK. In our construction, we require that P, at each round of the protocol 
(except the last that is the opening of commitments as required by BL protocol), 
provides a proof that either the messages are honestly computed according to 
the randomness committed in the first round, or the “trapdoor” condition is sat- 
isfied. Formally, P provides rZAPs for the following NP languages (except the 
language Asucom that is proved only by V using resSZK protocol). 


Api: correctness and consistency of the first round of Blum’s protocol (BL1). A 
tuple (x, M, Cr,, co) € Apri if there exist (rp, sp) such that cr, = SBCom(ro, sp) 
and m is honestly computed according to BL1 for the graph x using random- 
ness f,,(Cp). 

Ay: correctness and consistency of verifier’s messages of the protocol resSZK. 
A tuple (mp, my, Cr,, co) € Avy if there exist (ry,5,) such that cr, = 
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SBCom(rp, sb) and my is honestly computed according to the verifier’s pro- 
cedure of the protocol resSZK having in input prover’s message mp (mp 
corresponds to the concatenation of all messages played by the prover so 
far) using randomness f,,,(cp). 

Atrap: trapdoor theorem (true only for sub-protocol b). The pair (cs, b) € Atrap 
if there exists s such that cs = SBCom(b, s). 

AsHcom: validity of the opening (proved by V). The pair (cs, m) E€ Asucom if there 
exists s such that cs = SHCom(m,s). Note that for a statistically hiding 
commitment scheme, any pair (Cs, m) is actually in AsHcom. Nevertheless, V 
proves this theorem using the argument of knowledge resSZK. 


Protocol simresWZAoK consists of two phases (see Fig. [I). In the first phase, P 
and VY generate the random tapes that they will use to run the sub-protocols. 
P sends Y the commitments Cro, Cr, of two random strings ro, rı and the com- 
mitment Cs of a random bit b. This message is the determining message on 
which VY applies a PRF to generate a pseudo-random tape (to be used to exe- 
cute the sub-protocols). The second phase consists of a parallel execution of mo 
and 7, (see Fig. 2). P runs each sub-protocol on theorem x, randomness 79,11, 
and the witnesses for computing the rZAPs as inputs (i.e., the opening of the 
commitments of the determining message). V runs each sub-protocol using the 
pseudo-random tapes determined by the determining message received from P. 
Each sub-protocol is resettable WZ, while only one of the two sub-protocols is 
resettably-sound and a resettable AoK. Since V accepts the proof only if both 
executions are accepting, the final protocol is also a resettably-sound resettable 
Aok. 


Protocol simresWWZAoK 


Inputs: common input x € HC. 
P’s input: witness y, randomness w. V’s input: randomness r. 


P: b È {0,1}; ro, 71, 80, 1 © {0, 1}”. 


Send cro — SBCom(ro, so), Cr, — SBCom(ri, s1), cs — SBCom(b, s). 
Run in parallel 12 (x, y, ro, 80); n? (@, y, 11, 81). 
VY: upon receiving dm = (cry, ¢r,, Cs) from P. 
Rvo — fr(a|lerolles); Rvi = fr(2llers [les); 
Run in parallel mg (x, Rvo); nY (x, Rvi). 


Fig. 1. Simultaneously Resettable Argument of Knowledge 


The sub-protocol 7g is described in Fig. 2] We omit the first round of the rZAP 
and the first round of the statistically hiding commitment scheme SHCom. rZAPs 
are computed with independent randomness. We stress out that the determining 
message for V is the first prover’s message: dm = (c,,,Cr,,Cs)- The determining 
message for P is the first verifier’s message: (co, c1). 
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Sub-protocol: na = (n? (x,y, Ta, Sa), TY (x, Rva))- 

Inputs: common input: x (€ HC). P’s input: witness y for Rac; witness (ra, sa) to 
prove rZAP’s consistency theorem. V’s input: randomness Rvg. Protocols BL [6] and 
resSZK [20] are used as sub-protocols. 


— V: Pick challenge for BL protocol: cha È {0, 1}". Send ca — SHCom(cha) to P. 
— P: upon receiving ca (this is the determining message for P): 
1. Generate randomness Rpg — fr, (a||Ca). 
2. Compute the step BL1 for the instance x using randomness Rpg. Let us 
denote the output as mBL1?. 
3. Send mBL1? to V along with the rZAP for theorem: ((x, mBL1%, cr4, Ca) € 
ABL V (cs, d) E Atrap). 
— V: if rZAP is accepting send cha to P. 
Prove theorem (cą,cha) € AsHcom using resSZK protocol. Let MP be the 


prover’s message of sub-protocol resSZK (sent by V to P) and MY be the 
verifier’s message of resSZK (sent by P to V): 
1. (P — V) at each round of the protocol resSZK, upon receiving ME or from 


V, P computes my. using randomness Rpg and sends mY. to V along 
with an rZAP for the theorem ((Mm$ p MỌ, Cra» Ca) € Av V (cs, d) € Atrap). 
. (V > P) at each round of the protocol resSZK upon receiving mY from 
P, if rZAP is accepting V computes the next resSZK’s prover message and 
sends it to P. Otherwise it aborts. 
— P: upon successfully completing the resSZK protocol compute step BL3 and 
send the message mBL3¢ to V. 
— If mBL3? is the correct third message of BL protocol V outputs accept, else 
outputs abort. 


Fig. 2. Sub-protocol ma = (a7 (-), r¥(-)) 


2.2 Security Proof 


In this section we provide the high-level proof of the simultaneous resettable 
witness indistinguishability property and the resettable argument of knowledge 
property of the protocol depicted in Fig. 


Resettable-soundness. Towards showing resettable soundness we start with the 
following observations. Recall that by dm we denote the determining message 
sent by P* in the first round consisting of the commitment of two random seeds 
and the commitment of a bit (let us call the bit committed b). 


1. The randomness used by V depends on dm. In a resetting attack, malicious 
prover P* activates V by selecting theorem and randomness, denoted by 
(z,j) which forces V to run with the same randomness rj among several 
executions. However, the randomness actually used by V at each session is 
determined by the output of the PRF on seed r; and input (#,dm). Thus, 
even if activated with the same random tape rj, when receiving a new de- 
termining message, V executes the protocol with a fresh pseudo-random tape. 
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Note that, due to the computational indistinguishability of the PRF, sound- 
ness holds against a computationally bounded adversary. 

2. In sub-protocol m, the resetting power of P* is effective since P* can hon- 
estly prove the trapdoor theorem of the rZAP. Therefore, P* is not forced 
to use the randomness committed in the determining message among mul- 
tiple resetting attacks. Specifically, P* can mount the following attack. P* 
initiates a session labelled by (x,j,dm). In the sub-protocol mp, upon the 
reception of challenge ch, from V, P* resets V (while keeping the same de- 
termining message) back to the second round (the point after V has sent 
the commitment of the challenge). Then, P* changes the message mBL1° 
according to the challenge chp previously seen. This is possible using the 
trapdoor theorem, therefore P* does not need to stick with the randomness 
committed in the determining message. Since the determining message is the 
same as before the reset, V will use the same challenge in the sub-protocol 
Te. Thus, in this sub-protocol, P* can prove any theorem by obtaining the 
challenge in advance and thus m, is not resettable sound. 

3. In sub-protocol mg, the trapdoor theorem is always false, thus resetting V 
is ineffective. Indeed, in order to provide an accepting transcript, P* must 
provide an rZAP that only exists when the “consistency” theorem is true, that 
is, each of P*’s message is honestly computed according to the randomness 
committed in the determining message. By the statistically binding property 
of SBCom (there exists only one opening for the commitments cs and cr; ) and 
the soundness of rZAP (any unbounded P* cannot prove a false theorem), 
P* must be consistent with the randomness committed in the determining 
message. Therefore, mg is resettably sound. 


Assume that there exists a PPT malicious prover P* and a pair (x, j) such that 
VY accepts x with non-negligible probability for some x ¢ HC. By observation 
1, such a transcript is indexed by determining message dm. Thus, the accepting 
transcript can be labelled by triple (x, 7,dm). By observation 2, for the same 
determining message dm, there are polynomially many distinct transcripts for 
sub-protocol m, (P* can reset V polynomially many times and change the proto- 
col messages). All these (partial) transcripts of m, can be accepting for x ¢ HC 
since soundness does not hold for ma. However, by observation 3, for a fixed triple 
(x,r;,dm), there exists only one possible accepting transcript for sub-protocol 
mg since P* is forced to honestly follow the BL protocol according to the ran- 
domness committed in the determining message. Therefore the soundness of BL 
is preserved when P* resets V in 7. Since V accepts if and only if the executions 
of both sub-protocols are accepting, protocol simresWZAoK is resettably sound. 


Resettable argument of knowledge. To prove resettable argument of knowledge we 
show an expected PPT extractor that extracts the witness from any malicious 
prover P* with probability that is negligibly close to the probability that P* 
convinces an honest verifier. Let (x, 7,dm) be the label of the session in which P* 
provides an accepting proof. The goal of the extractor is to obtain two accepting 
transcripts with the same BL1 message and two distinct challenges (for at least 
one sub-protocol) for the same label. 
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Our extractor consists of two phases. In the first phase it follows the honest 
verifier procedure. When P* has completed its execution, if there exists an ac- 
cepting session labeled by (x, j,dm) that we call “target session”, the extractor 
proceeds to the second phase. In the second phase, the extractor obtains a dis- 
tinct accepting transcript for the target session by cheating in the “opening” of 
the commitment by sending a challenge that is distinct from the one sent in the 
first phase and simulating the zero knowledge proof given by the verifier. 

The crucial step of this phase is to detect the sub-protocol in which P* is stuck 
with the randomness committed in dm and must follow the protocol honestly. 
Indeed, in such sub-protocol, the extractor can use the stand-alone simulator 
and open the statistically hiding commitment to any challenge. Note that the 
non-black-box simulator of the protocol resSZK takes as input the code of the 
malicious verifier. Thus, in order to use the simulator, the extractor must care- 
fully prepare a machine which internally handles the interaction with P* and 
forwards to the simulator only the messages belonging to the resSZK protocol 
played in one of the sub-protocol. One of the tasks of such machine is detecting 
the sub-protocol in which P is forced to be honest. Once the right sub-protocol 
has been detected, by the statistically-hiding property of SHCom, and by the 
statistical zero-knowledge property of protocol resSZK run by V instead of the 
opening, we are guaranteed that upon each rewind, ?* provides another accept- 
ing transcript for the target session with the same probability of the first phase. 
Finally, by the proof of knowledge property of Blum’s protocol, collecting two 
distinct transcripts allows the extractor to compute the witness. The actual ex- 
tractor requires an intermediate estimation step (as shown in [14]) in which the 
probability of having another accepting transcript for the label (x, j,dm) is esti- 
mated. More details on the formal description of the extractor, the augmented 
machine and the formal proof can be found in the full version of this work. 


Resettable witness indistinguishability. Recall that the protocol mainly consists 
of a single message from P to V, the determining message (Cro, Cr, Cs), and the 
parallel execution of 79 and 71. Such protocol can be seen as a parallel repetition 
of (Io, M) where I, is the protocol 7, augmented with the message (Cs, Cr, ) 
sent from P to VY and b= 0,1. 

Assume that there exists a resetting PPT distinguisher V* for (Ho, Mı). That 
is, V* distinguishes whether P runs both protocols using witnesses sampled from 
distribution Yo = {y°(Z)}z or from distribution Yı = {y1(Z)}z. Let us denote by 
Ho,9 the experiment in which P uses witnesses sampled from Yo when running 
both protocols (I, Mz), where b is the bit committed in cs, and by Hı, the 
experiment in which P uses witnesses sampled from Y; in both (IMa, Ij). We 
prove by hybrid arguments that experiments Hp o and H, are computationally 
indistinguishable. Let n denote the number of theorems and t the bound on the 
prover’s random tapes. Consider the following hybrids. 


Hı o: In this hybrid, in each session, P uses witnesses sampled from Yj to run 
protocol I, and the bit b is committed in the determining message in such 
session. The only difference between experiment Hı and Hoo is in the 
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witness used in Jẹ. Assume that there exists a distinguisher between hy- 
brids Ho, and Hj,9 then it is possible to construct an adversary V3, for 
the WZ property of sub-protocol BL of I,. Note that, when b is the bit 
committed in the determining message, the trapdoor theorem is true in Ip. 
Vg; on input (z, Yo, Y1), runs V* as sub-routine and honestly executes the 
protocol Iş using the witness belonging to Yo. Instead for the execution of 
IT, it forwards the messages received from V* and belonging to BL protocol 
to the external prover, while it simulates the remaining messages belong- 
ing to Ip. The first difficulty in such reduction seems to be the fact that 
y* can mount a reset attack asking the prover of Ie to run with the same 
randomness while changing the challenge of BL protocol. Instead, Vg, can 
only mount a concurrent attack against the external BL’s prover. Neverthe- 
less, Vg, can replicate the same attack of V* for the following reasons. The 
randomness of the honest prover executing protocol I, is computed on the 
determining message (the commitment of BL’s challenge) received from V*. 
Due to the pseudo-randomness of PRF, when V* changes the determining 
message the prover of I, plays with fresh randomness. By the resettably- 
sound argument of knowledge property of the resSZK protocol and by the 
computational binding property of SHCom we have that V* can not maintain 
the same determining message and query the prover with two distinct BL’s 
challenges. Thus the resetting power is suppressed and Vg, can replicate the 
same attack as V*. The second difficulty is that for each protocol message 
the honest prover of I, is required to send a rZAP proving that the messages 
are consistent with the randomness committed in the determining message. 
However, in the reduction Vg, forwards the messages received by an external 
prover of BL’s protocol, therefore it can not prove the consistency with the 
determine message. Nevertheless, since we are in the case in which the trap- 
door theorem is true, V8, can forward the external messages and computes 
the rZAPs using the witness of the trapdoor theorem. Due to the resettable 
WT property of rZAP such deviation from the honest prover is not detected 
by any PPT V*. Then, by the WZ of BL protocol hybrids Ho,o and H1 9 are 
computationally indistinguishable. 


Hy (with 1 <i<n,1<j<t): In hybrid Hee in session (i, j), P runs protocol 
Ii; using the witness sampled from Yj, while protocol J, is run by using a 
witness sampled from Yo, and b is the bit committed in the determining 
message of such session. The only difference between experiment Hj, and 
Ho is that in experiment chee in session (i, j), the witness is sampled 
from Y; in the sub-protocol where the trapdoor theorem is false. Note that 
H? = Hio. Assume that there exists a distinguisher between Họ% and 
HE ~ then it is possible to construct an adversary for the hiding of the 
commitment scheme SBCom. The reduction works as follows. A playing in 
the hiding experiment obtains the challenge commitment C. Then it runs V* 
as sub-routine and simulates the honest prover P as in experiment HET I1, 
except that in session (i, j) it proceeds as follows. It computes c,,, oe as the 
honest prover, while it sets cs = C, and sends the first round to V*. Then 
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A uniformly chooses a bit b and executes the protocol m, using a witness 
sampled from distribution Y, and protocol 7; using the witness sampled from 
distribution Yo. Note that A can run both sub-protocols without knowing 
the opening of C since also the honest P never uses such witness in the 
protocol execution. When V* terminates its execution, A hands the output 
of V* to the distinguisher and outputs whatever the distinguisher outputs. 
If C is a commitment of b then the experiment simulated by A is distributed 
identically to experiment H4717". Else if C is a commitment of b then the 


experiment is distributed as experiment H. By the computational hiding 


of SBCom we have that experiments He and H” -1 are computational 
indistinguishable. 

1,1: In this hybrid, P uses a witness sampled from Y, to run protocol Ip 
and the bit b is committed in the determining message. The only differ- 
ence between experiment Hg Í and experiment Hı ı is in the witness used 
to run sub-protocol I. By the same arguments put forth in proving the 
indistinguishability of hybrid H1,o and Hoo, experiments Hg + and Hy; are 
computational indistinguishable. This completes the proof. 


Theorem 1. Jf trapdoor permutations and collision-resistant hash functions ex- 
ist, then the protocol shown in Fig. U] is a Simultaneously Resettable Witness 
Indistinguishable Argument of Knowledge. 


3 Application in the BPK Model 


Here we show that by combining two instances of simresWZAoK we obtain the 
first constant-round simultaneously resettable ZK AoK (simresZKAokK) in the 
BPK model. 


High-level overview of protocol and proof. The construction is very simple since 
it takes advantage of the properties guaranteed by the protocol simresWZAoK. 
We use it twice, once for a proof given by the verifier and once for a proof 
given by the prover. First, the verifier uses simresWZAoK to prove knowledge of 
its secret key (one out of two possible sets of pre-images of a OWF), then the 
prover commits to its witness and finally uses simresWVZAoK to prove that the 
committed message is either a witness for the theorem x €E L or a secret key. The 
intuition of why the protocol works is the following. First of all, the secret key of 
the verifier is protected by the one-wayness of the OWF, by the rWZ property 
of the simresWWZAoK given by the verifier and by the resettable argument of 
knowledge of the simresWZAoK given by the prover. Indeed, we will be able to 
prove that the witness extracted from the proof given by the prover can only 
be a witness for x € L, otherwise we break either the hardness of the OWF or 
the rWTZ property of simresWZAoK. Instead, the security for the prover comes 
from the existence of a simulator against any resetting verifier. Indeed, we can 
design a simulator as follows: the simulator starts a main thread that is always 
updated with new messages until the simulator is stuck. This event happens when 
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the simulator is supposed to commit to a witness and to then play the second 
simresVVZAoK. At this point, the simulator suspends the main thread and starts 
some rewinding threads in order to extract the secret key used by the adversarial 
verifier in that session. Once this is done, the simulator continues the main thread 
since it is not stuck anymore (i.e., it can simply commit to the extracted secret 
and use it as witness in the second simresWWZAok). Since the number of identities 
of possible verifiers in the BPK model is polynomially bounded, we have that the 
simulator has to start only an expected polynomial number of rewinding threads, 
and thus its expected running time is polynomial. The indistinguishability of the 
view comes from the hiding of the commitment scheme and the rWZ property of 
the second simresWWZAoK. Instead the resettable argument of knowledge of the 
first simresWZAoK (i.e., the one given by the verifier) is helpful for guaranteeing 
the expected running time of the simulator. The commitment played in between 
the two executions of the simresWZAoK plays an important role in breaking a 
possible malleability attack of the malicious sender. 

The formal description of the protocol is provided in Fig. B] For underlying 
primitives, we use a non-interactive statistically binding commitment scheme, 
denoted by SBCom, and a one-way function g : {0,1}* — {0,1}*. In the protocol 
we use the following two NP relations: 1) a pair ((y, g), £) E€ Ra,, if x is such 
that y = g(x); 2) a pair ((c,m),r) E RSBCom if the string r is such that c = 
SBCom(m, r). 


Theorem 2. If trapdoor permutations and collision-resistant hash functions ex- 
ist, then protocol simresZKAoK is a constant-round simultaneously resettable 
zero-knowledge argument of knowledge in the BPK model. 


For lack of space, the formal proof can be found in the full version of this paper. 


4 Simultaneously Resettable Identification Schemes 


In this section, we present the second application of our main protocol, the first 
construction of a simultaneously resettable identification scheme. Identification 
schemes represent one of the most successful practical applications of crypto- 
graphic protocols. The basic goal of an identification scheme is to prevent an 
adversary A from impersonating a honest user P to another honest user V. 
However, this is not sufficient for some applications. Indeed, consider the case 
in which VY provides a service to P, and the service is restricted only to a small 
community controlled by V. Then, P could give to another party T that is not in 
the small community, some partial information about his secret that is sufficient 
for T to obtain the service from V, while still T does not know P’s secret. The 
proof of knowledge property allows us to do secure identification as well as pre- 
venting the attack described above. When the identification protocol is a proof 
of knowledge, the sole fact that T convinces VY is sufficient to claim that one can 
extract the whole secret from T. This implies that T obtained P’s secret key cor- 
responding to his identity, and this is unlikely to happen in scenarios where the 
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Protocol simresZKAoK 


Ingredients: One-way function g, statistically binding commitment scheme 
SBCom, sub-protocol simresWZAoK. 

Key-Registration Phase: 

VY chooses a pair of secrets (sko, skı) where sk, € {0,1}” and b € {0,1}. Then 
V generates the corresponding public key (pko, pki) such that pk» = g(sk») for 
b € {0,1}. V publishes (pko, pki) in public file F and stores sk» as its secret trapdoor 
information with b — {0,1}. We assume that the i-th verifier V has public key 
(pki, pki) and secret key ski. 

Main-Execution Phase: 

Common input: NP-statement x € L and the verifier’s identity i. Hence, prover 
P knows public key (pk), pki) in F, chosen by V. 

Input for P: Witness w such that (z,w) € Rz and randomness rp. 

Input for V: Randomness ry, secret key sk}. 


— P: Obtain a sufficiently long pseudo-random tape rp = frp (x||pkb||pk1). From 


now on, P uses rp for the execution in the rest of protocol. For convenience, 
we assume that rp consists of four partitions, rp(1), rp(2), rp(3) and rp( ). 
(V > P): V proves, by using simresWZAoK, the following statement: 

There exists sk}, such that ((pki,g), ski) E€ Ray, V ((pki,g), ski) E RAw- 

For the execution of simresWZAoK, P uses random tape rp(1). 

(P — V) : If the above proof is rejecting, then P aborts. Otherwise, P commits 
to w and 0” as co — SBCom(w,rp(2)) and cı — SBCom(0",rp(3)). Then, P 
sends cg and cı to VY. 

(P — V): P by using simresWWZAoK and random tape rp(4) proves to V the 
following statements: 

J (w,r) such that (x,w) E Rx A ((co, w), r) E RsBcom OR 

. 3 (sk,r) such that ((pké,g), sk) € Rag, A ((c1, 8k), r) E RsBcom OR 

. I (sk,r) such that ((pki,g), sk) E€ Ra,, A ((c1, sk), r) E RsBcom- 

VY: output "accept" if and only if the proof provided by P is accepting. 


Fig. 3. Constant-Round Simultaneously Resettable ZKAoK in the BPK Model 


same secret key is used for other critical tasks such as digital signatures. As dis- 
cussed in the introduction, our simultaneously resettable identification scheme 
follows the above proof of knowledge paradigm. This extends the previous work 
of Bellare et al. [5] to a setting in which every party can be reset. We emphasize 
that our simultaneously resettable identification scheme is easily obtained from 
our main protocol simresWZAoK, so achieving a constant round complexity. 


Identification protocols secure against reset attacks. We introduce the notion of 
Reset-Reset-1 security as a generalization of the Concurrent-Reset-1 CR1 notion 
introduced in [5]. CR1 considers an adversary I, called impersonator, that plays 
in two phases. In the first phase, it interacts with a prover as a resetting verifier 
(Reset phase). In the second phase, it has no access to the prover anymore, but 
it tries to impersonate such a prover to an honest verifier (Concurrent phase). 
In the second phase, I is not allowed to reset the verifier. In our new definition 
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Reset-Reset-1 (RR1) the impersonator is allowed to reset in both phases. The 
formal definition is a straightforward extension of the one given in [5] and can 
be found in the full version of this work. 


The protocol TD. Let f : {0,1}" — {0,1}* be a one-way function, let n be 
the security parameter. The public key of P is the pair (pkg, pk,), the secret 
key is xq for a randomly chosen bit d, such that pk, = f(a) V pk, = f(a). 
The protocol simply consists in P running the simresWZAoK protocol with V 
to prove that it knows the preimage of either pko or pk,. Formally, let Azp be 
the following language Arp = {(yo, yi): there exists x € {0,1}"s.t. yo = f(x) V 
yi = f(x)}, then the identification scheme consists of P proving the statement 
(pko; pk,) E Azp using simresWZAoK. 


Theorem 3. If a constant-round simultaneously resettable WIAoK protocol ez- 
ists and one-way functions exist, then the above protocol is constant-round and 
secure in the RRI setting. 


Proof. Let pk = (pko, pk,) be the public key of a player P. Assume that there 
exists a PPT adversary J playing the RR1 experiment, that succeeds in imper- 
sonating an honest P with non-negligible probability. This means that I is able 
to prove to an honest V that her identity is pk = (pko, pk). Then we show 
that I can be used to construct an adversary against the one-wayness of f, or a 
distinguisher for the resettable WZ property of the simresWZAoK protocol. The 
resettable argument of knowledge property of simresWVZAoK protocol is crucial 
to put forth both reductions. 

Recall that, in the RR1 game, J plays the first phase interacting as a resetting 
verifier Y* with P and in the second phase interacts as resetting prover P* with 
Y trying to impersonate P. 

First we show an adversary A that breaks the one-wayness of f. A has in 
input a challenge y that is the output of f(x) for some unknown z. The reduction 
works as follows. A picks d € {0,1}, vq € {0,1}” and computes pk, = f(a) 
and pkz = y. Then it runs J as subroutine, in the first phase A simulates the 
honest prover playing the simresWZAoK protocol with witness xq. In the second 
phase, A simulates the honest verifier to J. If I provides an accepting proof, 
then A runs the extractor of the simresWZAoK protocol and, by the resettable 
argument of knowledge property, except with negligible probability, it obtains 
the witness used by J in the proof. In order to run the extractor, A prepares an 
augmented machine that internally contains all messages belonging to the first 
phase so that they can be internally played with I, while the messages sent by 
I in the second phase are forwarded to the extractor. Now note that during the 
extraction process the extractor rewinds the machine several times changing the 
protocol messages (of the second phase), therefore J could change her messages 
accordingly. Note that however, since there is a separation between the first 
phase and the second phase, this does not require to re-play messages of the first 
phase. Since, by assumption f is a one-way function, the probability that the 
witness extracted corresponds to a pre-image of y is negligible. 
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Now, assume that the witness extracted from J is xg. Then we can construct 
a distinguisher wz for the resettable witness indistinguishability property of 
simresWWZAoK. Awz works as follows. It computes pky = f(xo), pk; = f (21) 
and activates an external prover for the simresWZAoK protocol with inputs 
((pky, pk), (£0, £1)). In the first phase, when J runs as a verifier, Ayr for- 
wards all messages to the external prover of the simresVWVZAoK. In the second 
phase, when J runs as a prover, Awy follows the procedure of the honest veri- 
fier. Then, if J provides an accepting proof, then Ayz runs the extractor of the 
simresVWVZAoK protocol. Finally by the resettable argument of knowledge prop- 
erty, except with negligible probability, it obtains the witness used by J in the 
proof, i.e. it obtains xp or xı. Now notice that in the previous experiment, when 
we tried to invert the one-way function, the witness extracted corresponded to 
the one used in the first phase, while J was verifying the proof. Since this second 
experiment is identical to the previous one, it is again true that the extracted 
witness corresponds to the one used by the prover. Since the prover now is the 
external prover of simresWZAoK, we have that the above adversary Ayr breaks 
the rWT property of simresWZAoK. By the rWZ property of simresWZAoK, this 
event happens with negligible probability only and thus J wins the RR1 game 
with negligible probability. 
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Subspace LWE 


Krzysztof Pietrzak* 


IST Austria 


Abstract. The (decisional) learning with errors problem (LWE) asks 
to distinguish “noisy” inner products of a secret vector with random 
vectors from uniform. The learning parities with noise problem (LPN) 
is the special case where the elements of the vectors are bits. In recent 
years, the LWE and LPN problems have found many applications in 
cryptography. 

In this paper we introduce a (seemingly) much stronger adaptive as- 
sumption, called “subspace LWE” (SLWE), where the adversary can learn 
the inner product of the secret and random vectors after they were pro- 
jected into an adaptively and adversarially chosen subspace. We prove 
that, surprisingly, the SLWE problem mapping into subspaces of dimen- 
sion d is almost as hard as LWE using secrets of length d (the other 
direction is trivial.) 

This result immediately implies that several existing cryptosystems 
whose security is based on the hardness of the LWE/LPN problems are 
provably secure in a much stronger sense than anticipated. As an illustra- 
tive example we show that the standard way of using LPN for symmetric 
CPA secure encryption is even secure against a very powerful class of re- 
lated key attacks. 


1 Introduction 


The (search version of the) learning with errors problem (LWE) is specified by 
parameters @,q € N and an error distribution x over Zg. It asks to find a secret 
vector s € zé given any number of “noisy” inner products of s with random 
vectors. Formally, these products are samples from a distribution A, ¢(s) over 


Dt which is defined by sampling a uniform r Bs Zt and an error e + x, and 
outputting (r,r's + e) (where multiplications and additions are all modulo q.) 

An important special case of this problem is Regev’s LWE problem 
where x is a so called discrete Gaussian distribution and q is polynomial or 
exponential in a security parameter. Another important case is the learning 
parities with noise problem (LPN) where g = 2. 

The decisional version of the LWE problem asks to distinguish samples of 
the form A, (s) from uniform (which might be easier than to actually output s 
as required by the computational version of the problem). The decisional LWE 
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problem has been proven polynomially equivalent to the computational version 
if q is prime [Reg05], and in particular for LPN [K506]. In this paper 
we will always consider the decisional version of the problem, and we also only 
prove the main result for the case where q is prime. 


Regev’s LWE. The LWE problem has proven to be extremely useful to con- 
struct cryptographic schemes. One reason is its versatility, pretty much any cryp- 
tographic primitive known to date can be based on LWE. Another reason is its 
hardness. The best known algorithms against Regev’s LWE (where y is a dis- 
crete Gaussian and q = poly(@)) need time and space 20M to recover 
sE ze [] and unlike for most other assumptions on which public-key crypto can 
be based, no faster quantum algorithms for the problem are known. But most 
strikingly, Regev’s LWE is as hard as worst-case (standard) lattice assumptions 
Reg05} [Pei09]. 

An incomplete list of cryptosystems whose security can be reduced to LWE is 
public-key encryption secure against chosen plaintext 
and chosen ciphertext attacks [PWO8][Pei09], circular-secure encryption [ACPS09], 
identity-based encryption , oblivious trans- 
fer , collision-resistant hash functions and public-key 


identification schemes |Lyu08\|Lyu09}. 
LPN. The learning parity with noise (LPN) problem [BFKL94|/BKW00 


is the special case of the LWE problem where q = 2 (i.e. we work over bits) and 
the error distribution is the Bernoulli distribution for some constant parameter 
T,0 < T < 1/2, denoted Ber,, and defined as Pr[z = 1; x + Ber,] = 7. The LPN 
problem is closely related to the problem of decoding random linear codes a well 
studied question in coding theory. The LPN problem seems less versatile than 
the general LWE problem, and so far only “minicrypt” primitives (i.e. primitives 
known to be equivalent to one-way functions) were constructed under the LPN 
assumption. Alekhnovich constructs a public-key encryption scheme from 
a relaxed LPN assumption where the error 7 is not constant but upper bounded 
as a function of l as r = O(1/V2). 

The Appeal of the LPN problem comes from the fact that LPN based schemes 
can be extremely efficient, just requiring relatively few bit-level operations to 
compute an inner product of two bit-vectors. Constructions from LPN include 
PRGs and encryption schemes and public-key authen- 
tication schemes [Ste94], but by far most work has been done on efficient LPN 
based authentication schemes which we’ll discuss in more detail in Section [4] 


Subspace LWE. The LWE problem has been shown to be very robust with 
respect to leakage. Distinguishing LWE samples remains hard even if we the ad- 
versary can learn a function f(s) about the secret s as long as f(.) is compressing 


1 This is slightly better than a trivial brute-force search which takes time ~ 2°84 = 
2°96) but only linear space. 

? The only difference is that in the decoding problem one is given a fixed number of 
samples (typically a small multiple of the length of the secret), whereas in the LPN 
problem the adversary can ask for arbitrary many samples. 
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or hard to invert [GKPV10). In this paper we show 
that the LWE problem is also very robust to tampering with the secret vector s 
and the randomness vector r (albeit not with the noise e.) 

We define a (seemingly) much stronger adaptive version of LWE which we 
call “Subspace LWE”, or SLWE for short. In the SLWE problem the adversary 
is not restricted to just ask for samples r,r'.s + e from Ay (s) as in LWE, but 
has access to a more powerful oracle which she can query adaptively. The oracle 
takes as input the description of two affine mappings @,,¢@5 : zé > zé and 
outputs a sample 


r,¢r(r)'.¢s(s) +e where r & Zé 


Gee 


An affine mapping ¢, : zé > Ze (similarly ¢,) is given by a matrix and a vector 
oy = [X, € yeas Xp € z£] and its evaluation is defined as 


Pr(r) = X,.r + Xr 


Without additional restrictions, the SLWE problem as just defined is easy to 
break. By choosing the input to the oracle appropriately[] one can e.g. learn 
samples of the form s[i] +e , e 4 x (s[é] denotes the ith element of s.) For 
distributions y as used in LPN or Regev’s LWE one can efficiently learn sfi] 
(and thus the entire s) from just a few such samples. 

We prove that the SLWE problem (using secrets in zé and error distribution 
x) is almost as hard as the standard (q, X, d)-LWE problem with secrets of length 
d < £ if the adversary is restricted in the sense that she is only allowed to query 
or, @s which “overlap” in an d+ 6 (or more) dimensional subspace where 6 € N 
is a statistical security parameter. Formally this means X/.X, must have rank 
at least d + ô. We call this the (q, X, £, d + 6)-SLWE problem. Let us mention 
that the other direction — showing that (q, X, 4, d)-SLWE is at most as hard as 
(q, x, d)-LWE - is trivial. 

The precise statement of our result asserts that for any £,d,ô E€ N,d +ô < 4, 
the (q, x, £, d+ 8)-SLWE problem is at most an additive term 2/q°*! easier than 
the standard (q, X, d)-LWE problem. For large fields, where q is superpolynomial, 
2/q°*! is negligible already for 6 = 0. For small fields, in particular the important 
case q = 2 as used in LPN, we must choose some 6 to be a statistical security 
parameter. 

The above formulation of SLWE is somewhat redundant, in the sense that an 
adversary who is restricted to always choose ¢, to be the identity function, is 
as powerful (i.e. can learn exactly the same distribution from the SLWE oracle) 
as the adversary described above. We chose to explicitly allow the adversary to 
choose affine mappings for the randomness and the secret separately, as for the 
applications it is sometimes more convenient to think of the adversary being able 
to apply a mapping to the secret key (like in the setting of related key attacks 
we'll discuss), or to the randomness (e.g. to show that LWE is hard, even if the 
randomness comes from a bit-fixing source.) 


3 Set x, = Xs = 0° and X,, X, the zero matrix with a single one in the ith diagonal 
element. The oracle will output r, r[i]s[‘] + e, the last element is s[i] + e if rfi] = 1. 
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When q is not Prime. The reduction from SLWE to LWE assumes that q is 
prime, as we use the fact that Zq is a field and Z7” is a vector space[| We believe 
that the proof of the reduction can be adapted to the case where q is composite. 

The case where q is prime covers the cryptographically most interesting cases 
of LPN and Regev’s LWE. Also the reduction from the search to decision version 
of LWE only works for prime q (of polynomial size.) But the case where 
q is not prime has found cryptographic applications too. In particular, the case 
where q = p° for a prime p and e > 1 has been used in the construction of 
an encryption scheme with circular security [ACPS09]. The case where q is a 
product of distinct, small primes has been used in : 


Applications of SLWE. In Section [4] we'll discuss some applications of the 
SLWE problem. In particular, the fact that SLWE is equivalent to LWE implies 
stronger security notions — like security against related-key attacks — that one 
can give for existing schemes whose security is reduced to the LWE problem. 
In subsequent work, the hardness of SLPN has been used to construct efficient 
authentication schemes and even MACs from LPN. These schemes differ sig- 
nificantly from previous schemes which all were extensions of the Hopper-Blum 
protocol. 


Outline. In Section P] we first define the LWE and the new subspace LWE 
(SLWE) problem. In Section B] we state and prove our main technical result 
(Theorem [}) which bounds the hardness of the SLWE problem in terms of the 
hardness of he standard LWE problem. In Section [4] we describe in more detail 
some applications of this result which were already mentioned in the introduc- 
tion. 


2 Hard Learning Problems 


2.1 Notation 


We denote the set of integers modulo an integer q > 1 by Z,. We will use normal, 
bold and capital bold letters like x, x, X to denote single elements, vectors and 
matrices over Z,, respectively. For x € Zi, |x| = Z denotes the length and 
wt(x) denotes the Hamming weight of the vector x, i.e. the number of indices 
i € {1,...,|x]} where x[i] 4 0. For v € Z} we denote with V its inverse, i.e. 
vi] = 1 — vfi] for all i. For a distribution x, x + x denotes sampling a value 


x with distribution y. For a set S, x È S denotes sampling a value x with the 
uniform distribution over S. 


Xv, Xıv : For two vectors v € Z$ and x € Zi, we denote with x,y the vector 
(of length wt(v)) which is derived from x by deleting all the bits x[i] where 


vli] = 0. If X € Z{*™ is a matrix, then Xyy € Zy*™)*™ denotes the 
submatrix we get by deleting the ith row if vfi] = 0. 


4 The fact that Zq is a vector space is e.g. used in the proof of Lemma [i] 
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Xv, Xv : For x, X, v as in the previous item, x, denotes the vector where the 
ith entry is x[i] A v[i]. Think of xy as x where all entries of x where v is 0 
are set to 0. X, denotes the matrix X where the ith row is set to all 0 if 
vii] = 0. 


2.2 The (Subspace) LWE Problem 


The (search version of the) learning with errors (LWE) problem is specified by 
parameters £,q € N and an error distribution x over Zg. It asks to find a secret 
vector s € Zt given any number of “noisy” inner products of s with random 
vectors. 

Formally, let Ay,e(s) be the distribution over Z+! where a sample is given by 

(r,r’.s +e) < Ay o(s) where rÈ Zi ex 

Let U;" denote the uniform distribution over Zy and U, = Us. The decisional 
LWE problem asks to distinguish samples from Ay, (s) with a uniform s from 
a random oracle (outputting tee samples.) For any s, Ay,,¢(s) is the same as 
the uniform distribution U, an It will be convenient for the proof to think of the 
random oracle as outputting samples from Ay,,¢(s) for some random s instead 
of GE 


Definition 1 (Decisional Learning with Errors Problem (LWE)). The 
(decisional) (q,x,£)-LWE problem is (t,Q,¢) hard if for every distinguisher D 
running in time t and making Q oracle queries, 


|Pr[s& ze: De® <1] -Pr[s Ezi : Dav) <1] | <e. (1) 


£+1 
Pr[DYa" =1] 


Usually one defines the LWE problem by considering a distinguisher who gets a 
polynomial number of samples as input and not access to an oracle (which doesn’t 
take inputs anyway.) We use this oracle based definition so it is more similar to 
the SLWE problem we define below, where the oracle does take adaptively chosen 
inputs. 

An affine projection ¢ : zé > zé is given by a matrix/vector tuple X € 
Z£”, x € ZÉ and defined as $(v) df XTy 4x. 

For s € z and affine projections ¢, = [X,,x;,],¢s = [Xs,Xs] we define the 
distribution Dy,¢,a(s, r,s) over Zit! U L as 


Lo + Typals,ori¢s) if rank(X!X,)<d 
and 
[r ’ ,(r)".bs(s) +e | + Iy eals, br, ds) where rez €X 


otherwise. With T%,2,a(s,.) we denote the oracle which on input r,s outputs 
a sample of T% 2,a(S, dr, s). 
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Definition 2 (Subspace Learning with Errors Problem (SLWE)). The 
(decisional) (q, x,¢,d)-SLWE problem is (t,Q,¢) hard if for every distinguisher 
D running in time t and making Q oracle queries, 


[Pr [s È Zi, © DP xta(8-) — 1] —Pr [s È z : DTYatalso) — 1| <e. (2) 


Note that by definition the Iy,,¢,a(s.,) oracle outputs L if the input satisfies 
rank(X/X,) < d and a uniform sample Ug! otherwise. In particular, like 
Au, (S), the output distribution of Iy,,¢,a(s.,) is independent of s. 


3 The Hardness of SLWE 


Theorem [I] below is the main technical result stating that the SLWE problem 
mapping into subspaces of dimension d is almost as hard as the standard LWE 
problem with secrets of length d. But let’s first look at the (easy) other direction 
as stated by Claim [I] below. 


Claim 1 ((q, xX, £ d)-SLWE at most as hard as (q, x, d)-LWE). If (q, x, 4 d)- 
SLWE is (s,t,€) hard then (q, x, d)-LWE is (s’,t,€) hard where s' = s—poly(t, £). 


Proof (of Claim). To prove this claim we will show how, for any error distribution 
x’, one can efficiently generate (q, x’,d)-LWE samples which have distribution 
Ay: ,a(s’) (for some uniform s’ € Z2) given access to a (q, x’, £,d)-SLWE oracle 
Ty als, .) (for some uniform s € Z£.) We do so without known knowing the 
distribution x’ or s. 

Given such a transformation, we then can use any distinguisher D who breaks 
the (q, x, d)-LWE assumption with advantage € as defined in eq.(I), to break the 
(q, x, £, d)-SLWE assumption as in eq.(2) with the same advantage by simply 
transforming the SLWE samples (where the oracle uses either the error distri- 
bution x’ = x or x = ua but we don’t know which) to LWE samples (with 


the same unknown error distribution x’) before forwarding them do D. 


Let v © 14\|0°~4. To generate samples as described above, query Iy,¢,a(s, -) 


so it outputs samples Ax a(s") where s’ € zg consists of, say, the first d elements 
ofs € Zi, i.e. s’ := sjy. This can be done by making q queries Xs, Xr, Xs, Xr 
to Ty-¢a(s,.) where x, = x, = 0° and X, = X, is 1 in the first d diagonal 
entries and 0 everywhere else. The output of the SLWE oracle on these queries 
are samples of the form 


r,(X,r+x,)'(X,s+x,)+e where e} y’,r È Zi 


a ae Ne ee 
T 
riv Sv 


from which we then get an A, a(Syv) sample ryv, rlys jv +e by replacing r with 
rjv. Note that these samples have the right distribution, which means syy and 
the q ry,’s are uniformly random as required. This is easy to see recalling that 
s and the q r’s are uniform. 
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In the proof of Theorem [I] we’ll need the following simple technical Lemma: 


Lemma 1. For q,d,ô € N, let A(q,d,6) denote the probability that a random 


matrix in goer has rank less than d, then 
A(a,d,0) < sz ` 
q 
z y (d+6)xd 
Proof. Assume we sample the d columns of a matrix M € Z, one by one. 
For i = 1,...,d let E; denote the event that the first i columns are linearly 
independent, then 
gS 105 
Pr[E;|Ei_1] = a qi! 


as ~E; happens iff the ith column (sampled uniformly from a space of size g?*°) 
falls into the space (of size q7!) spanned by the first i — 1 columns. We get 
further 
d d 9 
A(q, d, ô) = Pr[5Eq] < XC Prp E:n] = Sg 8 < P 


i=1 i=l 


Theorem 1 ((q, x,¢,d)-SLWE almost as hard as (q, x, d)-LWE) 
For q,d,6,€ € N. If the (q, x, d)-LWE problem is (s,t, €) hard, then the (q, x, £, d+ 
6)-SLWE problem is (s’,t,¢’) hard where 


s' = s — poly(£, t) € = e+ 
Proof (of Theorem [I]). To prove the theorem we will show how to sample out- 
puts of an SLWE oracle Dy ¢a+5(8,.) for some uniformly random $ € ze and 
adversarially chosen inputs, given only standard LWE samples Ay, a(s) for some 
uniform s € Z8. This sampling is done without knowing s or the error distribu- 
tion y’. 

Given such a transformation, we then can use any distinguisher D who breaks 
the (q, x,¢,d + ô)-SLWE assumption with advantage e€ to break the standard 
(q, x, d)-LWE assumption with the same advantage, minus the probability that 
the transformation will fail (which, unlike in the previous claim, is non-zero.) 

Recall that an LWE sample Ay a(s € Zi) is of the form 


r,r'.s+e where ey r È za (3) 


For Xr, Xs € Zi**, Xs,Xr € Zf, well show how to transform such a sample 
into a an SLWE sample Ty e, a+6(8, [Xr, Xr, Xs,Xs])- If rank(X].X.) < d+6 this 
sample is simply _, so from now one we assume that this rank is at least d+ 6, 


in this case the sample has the form 


£,(Xy#+x,)"(X,.8+x,) +e where ery #£&Z! (4) 
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In our transformation, the SLWE secret s € Zi is defined as a function of the 
LWE secret s € Z2 as follows 


Rézxd bZ g=Rs+b (5) 


Note that we only know R, b (which we sampled), but will not get $ as we don’t 
know s. Also note that $ is uniformly random as it is blinded with a uniform 
b. Define the set £L C Z£, which is the set of solutions to a system of linear 
equations, as 

L=fy : y.X/.X,.R=r' —x!.X,.R}. (6) 


If X/.X,.R has rank at least d, then £ is not empty as the linear equation 
considered in eq.(6) is (over)defined (we will bound the probability that the 
rank is d later.) In this case the LWE sample is transformed into an SLWE 
sample as 


z a $ 
r,r's+e > #r'stzte where PSL (7) 
—— <— 
LWE Sample SLWE Sample (4) 


and the z is computed from known values as 


z E (EUXI +x!).X4.b4+ (Kp +x) xs 
It follows from the three claims below that this sampling gives the right distri- 
bution. 


Claim. If T & X/.X,.R has rank > d then ĉ & Lis uniformly random (given 


Hake X, Xf, R, b.) 


Proof (of Claim). Fix some t € Z$ of weight wt(t) = d such that Ty, has full 
rank. Such a t exists as T has rank d. 
By eq. (6), # < £ is a random solution to the equation 


#.T=r' —x!.X,.R 
or equivalently (using ¢.T = fy¢.Ty + ôT) 


Pye-Tyt = r! = x! X,.R = fT (8) 


Now sampling a random f can be done as follows. First sample f yE rd “a? 
uniformly. The remaining d positions ry4 € Ze are then uniquely determined by 
r and given by the solution to the equation (8). 

As Ty, is a full rank square matrix eq.(8) defines a bijection between f)_ and 
r. As r is chosen uniformly at random, also f+ is uniformly random. Thus the 
entire f is uniform as claimed. 


Claim. The #,r'.s + z+ e as sampled in (7) is an SLWE sample for secret $, 
randomness f and error e. 
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Proof (of Claim). 


f, (X ê + xr) (X58 +x.) +e (SLWE sample) 
= f, (Xr + xr) (X68) + (Xr. + xr)" x, +e 
iy + x7)" (K.(R.s +b) + (Xt x) eye 
= f, (@'.X! +x!).(X,.B-s) + (#'.X! +x!).X,.b+ (Xr. + xr)" x, +e 
C M 
© 


f,r'stzte 


We have shown how to simulate an SLWE oracle [y¢.a45(8, .) from standard 
LWE samples A, a(s). This simulation goes well as long as we never get a query 
containing X,.,X, where rank(X!.X,) > d+6 (so the sample is not just L) but 
where rank(X].X,.R) < d (in this case £ can be empty.) The following claims 
bounds the probability of this happening. 


Claim. Consider any X € ze with rank(X) > d+, then (with A as defined 
in Lemma [I) 


Pr[rank(X.R) <d : R È Z$] < Alq, d, ô) 


Proof (of Claim). Since the matrix X has rank at least d + 6, without loss of 
generality, we can assume that the first d+ 6 rows of X are linearly independent. 
Since R is a random matrix, the upper (d+ 6) x d submatrix of X.R is a random 
matrix in gre and (by definition) such a matrix has rank strictly less than 
d with probability at most A(q,d,6). Thus X.R has rank strictly less than d 


with at most the same probability. 


Using the union bound, we can upper bound the probability that for any of 
the t queries the matrix X = X].X, chosen by the distinguisher D will satisfy 
rank(X.R) < d by 

2-t 


t: A(q,n,d) < on 


This error probability is thus an upper bound on the gap of the success probabil- 
ity e’ of D (in breaking SLWE) and the success probability € we get in breaking 
LWE using the transformation. 

Above we ignored the fact that D can choose its queries, and thus the matrix 
X = X!_.X,, adaptively. To show that adaptivity does not help in picking an X 
where X.R has rank < d we must show that the view of D is independent of 
R (except for the fact that so far no query was made where rank(X.R) < d.) 
To see this first note that 8 = s.R +b is independent of R as it is blinded 
with a uniform b. In fact, the only reason we use this blinding is to enforce this 
independence. The f are independent as they are uniform given R as shown in 
the first Claim in the proof of this theorem. 
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4 Applications 


In this section we discuss some consequences and applications which use the fact 
that the new subspace LWE problem is as hard as the classical LWE problem. 


4.1 Security against Related Key Attacks 


Theorem [I] implies that many existing schemes whose security is based on the 
standard LWE/LPN assumption are secure against attacks not anticipated by 
the designers of the schemes. 

As an illustrative example below we discuss the simple construction of sym- 
metric CPA secure secure encryption from LPN [GRS08]. We show that this 
simple scheme is not only CPA secure, but it’s even secure against powerful 
related key-attacks. The scheme from is defined as follows 


Public Parameters 
— Constants0<7<05,6>0,2EN. 
— An error correcting code E : Z7 + Z3 , D : Z5 > Zy, where D can 
correct up to (T + 6)é errors. 


Key Generation: KG(¢) samples and outputs s È Z5: 


Encryption: Enc(K, m) samples R ee Te È Ber? and outputs the ci- 
phertext (R, R! .s $ e 9 E(m)). 
Decryption: Dec(K,(R,z)) outputs D(z @ R!.s). 


Correctness. To see that this scheme is correct, note that on input a correctly 
generated ciphertext (R, RT.s e © E(m)), the decryption algorithm outputs 
D(e@E(m)), which is equal to m unless the error vector e has weight more than 
(7 + 6)é. As the bits of e are i.i.d. with each bit being one with probability 7, 
the probability of e having such high weight can be upper bounded (using the 
Chernoff bound) by an exponentially small probability 2~7* (for some y > 0 
which depends on 7, 6). 


CPA Security. Recall that an encryption scheme is IND-CPA secure if no 
efficient adversary A can win the following game with probability noticeably 
better than 1/2: 


1. We sample a key s È Z and a bit b & {0,1}. 
2. A gets access to an oracle Enc;(s,.) where 

— Enco(s,m) = Enc(s,m) (encrypt m) 

— Enci(s,m) = Enc(s,0'™!) (encrypt dummy message) 
3. A outputs b and wins if b = 0’. 


The IND-CPA security of the encryption scheme follows quite easily 
from the LPN assumption, i.e. the fact that samples (R, RT.s + e) are pseudo- 
random. 
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RKA Security. Classical security notions, like IND-CPA security, model the 
encryption scheme as a “black-box”, where an adversary can only observe the 
legitimate input-output behavior of the scheme. Unfortunately, in the last decade 
it became evident that such idealized models fail to capture many real-world 
attacks where an adversary can attack an actual physical implementation of the 
scheme. An important example is direct leakage from the secret state, typically 
by side-channel attacks or malware. To deal with this issue, in the last years many 
“intrusion-resilient” and “leakage-resilient” schemes have been proposed [[SW03] 
[MRO] [Dzi06] [DPO7] [DPO8} [ADW05] [Pic09] [CDD* 07] [KV09] [DW05] [DKL09]. 

But the key can also leak indirectly, for example due to key-dependent messages 
[BRS03} [HK07) [BHHOO8} [HU08} [ACPS09) [BHHT10] [BG10] [ABBC10). Here, as 
the name suggest, one considers a setting where the encrypted message can depend 
on the secret key. Another important setting are related-key attacks (RKA). In an 
RKA attack on a encryption scheme the adversary can not only ask for encryp- 
tions under the secret key s, but also under “related” keys. RKA attacks were first 
considered by Biham and Knudsen [Knu92}, and were extensively stud- 
ied in the last decade [Luc04]/[BDK06]/BDK08| |FKL* 00] [JD04] [ZZWF07] [BCI0). 
Bellare and Kohno initiated a formal study of RKA attacks. All this works 
consider RKA security of deterministic primitives, usually block-ciphers. 

Very recently initiated a formal study of RKA security for probabilis- 
tic encryption [GM84]. As in [BK03], they define RKA with respect to related- 
key-deriving functions (RKD) ®. -RKA-IND-CPA security of an encryption 
scheme is then defined almost like standard IND-CPA security, but where the 
adversary can additionally apply any function ¢ € ® to the secret key s, i.e. 


1. We sample a key s È Z$ and a bit b È {0,1}. 
2. A gets access to an oracle Encã (s, .,.) where 

— Enc?’ (s,m, ¢ € 8) = Enc(¢(s),m) (encrypt m) 

— Enc} (s,m, ¢ € B) = Enc(¢(s), 0!™!) (encrypt dummy message) 
3. A outputs b and wins if b = 0’. 


In [AHTI] it is shown that is D°-RKA-IND-CPA secure where Ø? is 
the class of XOR relations. This class contains, for every A € ZS, the function 
pals) É s@A. 

This is an interesting class of relations as (1) it captures realistic RKA and 
(2) many existing schemes (mostly block-ciphers) have actually been shown to 
be insecure against 6°-RKA. Unfortunately 6°-RKA security does not imply 
any security in the realistic scenario where an adversary can not only flip, but 
set some of the bit of the secret key. Neither does it cover the case where the 
adversary can exchange the position of the key bits. 

Using Theorem [i] we can show that the scheme is in fact secure against a 
much more powerful class of “affine relations”, which as special cases contains 
the relations just mentioned. Let 3 be the class which contains the functions 


éx,x(s) = X7.s@x 


for every X € Z5%",x € Z$ where rank(X) > d. 
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Proposition 1. Under the (decisional) (7, ¢,d)-SLPN assumption) (which by 
Theorem[]is equivalent to the standard LPN assumption), the encryption scheme 
from [GRS08) is 63'-RKA-IND-CPA secure. 


Proof. For any ¢ € 8f, samples of the from R, R'.¢(s) + e are pseudorandom 


aff aff 
by assumption. So the outputs of both Enci 4 (s,.,.) and Enc?" (s,.,.) are pseu- 
dorandom and thus indistinguishable. 


paff is a very powerful class of relations, and captures many realistic settings. It 
contains ®, but also the class of relations St C #3 which allows to overwrite 
all but d bits of the input, and the class Pem c 2 which allows to permute 
the key bits[] Previous to our work no scheme was known to be provably secure 
against 3, or even just for one of the special cases Sêt (for d > 0) or Pe™. 
In fact, no deterministic encryption scheme can be secure against P*'™, and no 
“natural [] deterministic scheme can be secure against D§*. 


4.2 Weak Randomness and New Constructions 


The RKA security example from the previous section used the fact that an 
adversary can apply any affine function to the LWE secret. There are also natural 
implications from the fact that she can apply a mapping to the randomness r. For 
example, it implies that LWE is hard, even if the randomness r used to generate 
the samples rTs+e is not uniform, but comes from a bit-fixing source [CGH 85]. 
Let us stress that the (comparably small) amount of randomness necessary to 
sample the error e must be uniform. 

Theorem [I]not only has implications for existing constructions, but in subse- 
quent work has inspired completely new constructions, most notably the authen- 
tication schemes and message authentications codes proposed in [KPC*1]]. 
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Abstract. In this work, we show how to construct IBE schemes that 
are secure against a bounded number of collusions, starting with under- 
lying PKE schemes which possess linear homomorphisms over their keys. 
In particular, this enables us to exhibit a new (bounded-collusion) IBE 
construction based on the quadratic residuosity assumption, without any 
need to assume the existence of random oracles. The new IBE’s public 
parameters are of size O(t\log I) where J is the total number of identi- 
ties which can be supported by the system, t is the number of collusions 
which the system is secure against, and A is a security parameter. While 
the number of collusions is bounded, we note that an exponential number 
of total identities can be supported. 

More generally, we give a transformation that takes any PKE satis- 
fying Linear Key Homomorphism, Identity Map Compatibility, and the 
Linear Hash Proof Property and translates it into an IBE secure against 
bounded collusions. We demonstrate that these properties are more gen- 
eral than our quadratic residuosity-based scheme by showing how a sim- 
ple PKE based on the DDH assumption also satisfies these properties. 


1 Introduction 


The last decade in the lifetime of cryptography has been quite exciting. We are 
witnessing a paradigm shift, departing from the traditional goals of secure and 
authenticated communication and moving towards systems that are simultane- 
ously highly secure, highly functional, and highly flexible in allowing selected 
access to encrypted data. As part of this development, different “types” of en- 
cryption systems have been conceived and constructed to allow greater ability to 
meaningfully manipulate and control access to encrypted data, such as bounded 
and fully homomorphic encryption (FHE), identity-based encryption (IBE), hier- 
archical identity-based encryption (HIBE), functional encryption (FE), attribute 
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based encryption (ABE), and others. As is typical at any time of rapid innova- 
tion, the field is today at a somewhat chaotic state. The different primitives of 
FHE, IBE, HIBE, FE, and ABE are being implemented based on different com- 
putational assumptions and as of yet we do not know of general constructions. 

One way to put some order in the picture is to investigate reductions between 
the various primitives. A beautiful example of such a result was recently shown by 
Rothblum [29], who demonstrated a simple reduction between any semantically 
secure private key encryption scheme which possesses a simple homomorphic 
property over its ciphertexts to a full-fledged semantically secure public key 
encryption scheme. The homomorphic property requires that the product of a 
pair of ciphertexts cı and c2, whose corresponding plaintexts are mı and mo, 
yields a new ciphertext c1 - c2 which decrypts to mı + mz mod 2. 

In this paper, we continue this line of investigation and show how public-key 
encryption schemes which posses a linear homomorphic property over their keys 
as well as hash proof system features with certain algebraic structure can be 
used to construct an efficient identity-based encryption (IBE) scheme that is 
secure against bounded collusions. The main idea is simple. In a nutshell, the 
homomorphism over the keys will give us a way to map a set of public keys 
published by the master authority in an IBE system into a new user-specific 
public key that is obtained by taking a linear combination of the published keys. 
By taking a linear combination instead of a subset, we are able to achieve smaller 
keys than a strictly combinatorial approach would allow. Our constructions allow 
the total number of potential identities to be exponential in the size of the 
public parameters of the IBE. The challenge will be to prove that the resulting 
cyptosystem is secure even in the presence of a specified number of colluding 
users. For this, we rely on an algebraic hash proof property. 

To explain our results in the context of the known literature, let us quickly 
review some relevant highlights in the history of IBEs. The Identity-Based En- 
cryption model was conceived by Shamir in the early 1980s [30]. The first con- 
structions were proposed in 2001 by Boneh and Franklin [6] based on the hard- 
ness of the bilinear Diffie-Hellman problem and by Cocks [13] based on the 
hardness of the quadratic residuosity problem. Both works relied on the random 
oracle model. Whereas the quadratic residuosity problem has been used in the 
context of cryptography since the early eighties [22], computational problems 
employing bilinear pairings were at the time of [6] relative newcomers to the 
field. Indeed, inspired by their extensive usage within the context of IBEs, the 
richness of bilinear group problems has proved tremendously useful for solving 
other cryptographic challenges (e.g. in the area of leakage-resilient systems). 

Removing the assumption that random oracles exist in the construction of 
IBEs and their variants was the next theoretical target. A long progression of 
results ensued. At first, partial success for IBE based on bilinear group assump- 
tions was achieved by producing IBEs in the standard model provably satisfying 
a more relaxed security condition known as selective security [I4], whereas 
the most desirable of security guarantees is that any polynomial-time attacker 
who can request secret keys for identities of its choice cannot launch a successful 
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chosen-ciphertext attack (CCA) against a new adaptively-chosen challenge iden- 
tity. Enlarging the arsenal of computational complexity bases for IBE, Gentry, 
Peikert, and Vaikuntanathan proposed an IBE based on the intractability 
of the learning with errors (LWE) problem, still in the random oracle model. Ul- 
timately, fully (unrelaxed) secure IBEs were constructed in the standard model 
(without assuming random oracles) under the decisional Bilinear Diffie-Hellman 
assumption by Boneh and Boyen [5] and Waters [34], and most recently under 
the LWE assumption by Cash, Hofheinz, Kiltz, and Peikert and Agrawal, 
Boneh, and Boyen [i]. Constructing a fully secure (or even selectively secure) 
IBE without resorting to the random oracle model based on classical number 
theoretic assumptions such as DDH in non-bilinear groups or the hardness of 
quadratic residuosity assumptions remains open. 

A different relaxation of IBE comes up in the work of Dodis, Katz, Xu, and 
Yung in the context of their study of the problem of a bounded number of se- 
cret key exposures in public-key encryption. To remedy the latter problem, they 
introduced the notion of key-insulated PKE systems and show its equivalence to 
IBEs semantically secure against a bounded number of colluding identities. This 
equivalence coupled with constructions of key-insulated PKE’s by [16] yields a 
generic combinatorial construction which converts any semantic secure PKE to 
a bounded-collusion semantic secure IBE, without needing a random oracle. 


New Results. The goal of our work is to point to a new direction in the con- 
struction of IBE schemes: the utilization of homomorphic properties over keys of 
PKE schemes (when they exist) to obtain IBE constructions. This may provide 
a way to diversify the assumptions on which IBEs can be based. In particular, 
we are interested in obtaining IBE constructions based on quadratic residuosity 
in the standard model. 

In recent years, several PKE schemes were proposed with interesting homo- 
morphisms over the public keys and the underlying secret keys. These were 
constructed for the purpose of showing circular security and leakage resilience 
properties. In particular, for both the scheme of Boneh, Halevi, Hamburg, and 
Ostrovski [8] and the scheme of Brakerski and Goldwasser [9], it can be shown 
that starting with two valid (public-key, secret-key) pairs (pki, sk1), (pk2, ska), 
one can obtain a third valid pair as (pki - pko, sk, + ska). 

We define properties of a PKE scheme allowing homomorphism over keys 
that suffice to transform the PKE into an IBE scheme with bounded collusion 
resistance. As examples of our general framework, we show how to turn the 
schemes of [8] and a modification of [9] into two IBE schemes in the standard 
model (that is, without random oracles), which are CPA secure against bounded 
collusions. Namely, security holds when the adversary is restricted to receive t 
secret keys for identities of its choice for a pre-specified t. We allow the adversary 
to choose its attack target adaptively. The security of the scheme we present here 
is based on the intractability of the quadratic residuosity problem. In the full 
version of this paper, we also present a second scheme with security based on the 
intractability of DDH. Letting the public parameters of the IBE be of size O(nA) 
where A is a security parameter, the new DDH-based IBE will be secure as long 
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as the adversary is restricted to receive t secret keys for adaptively chosen ID’s 
where t = n — 1. The QR-based IBE will be secure as long as the adversary is 
restricted to receive t secret keys for t = eas — 1, where J is the total number 
of users (or identities) that can be supported by the system. There is no upper 
bound on J, which can be exponential in the size of public parameters. 

Let us compare what we achieve to the constructions obtained by [I6]. In 
their generic combinatorial construction, they start with any PKE and obtain 
a bounded-collusion IBE, requiring public parameters to be of size O(t? log T) 
times the size of public keys in the PKE scheme and secret keys to be of size 
O(t log I) times the size of secret keys in the PKE scheme for t collusions and I 
total identities supported. Their approach employs explicit constructions of sets 
S1, ..., Sr of some finite universe U such that the union of any t of these does not 
cover any additional set. There is a public key of the PKE scheme generated for 
each element of U, and each set S; corresponds to a set of |S;| PKE secret keys. 
There are are intrinsic bounds on the values of I, |U|,¢ for which this works, and 
note that their values of |U| = O(t? log I) and |9;| = O(tlog I) for each i 
are essentially optimal. In contrast, by exploiting the algebraic homomorphisms 
over the keys, we require public parameters of size roughly O(t - log I) times the 
size of public keys and secret keys which are O(A) (within a constant times the 
size of PKE secret keys) for our quadratic residuosity based scheme. (This is 
assuming a certain relationship between the security parameter À and n. See the 
statement of Theorem [2] for details.) 

In [16], they also provide a DDH-based key-insulated PKE scheme which is 
more efficient than their generic construction. It has O(tA) size public parameters 
and O(A) size secret keys. Viewing their scheme in the identity based context 
results in, perhaps surprisingly, the DDH based scheme we obtain by exploiting 
the homomorphism over the keys in BBHO [8]. In the full version of this paper, 
we describe this scheme and show it can be proved secure against t collusions 
using our framework. 


1.1 Overview of the Techniques 


The basic idea is to exploit homomorphism over the keys in a PKE system H. 
The high-level overview is as follows. 
Start with a PKE J with the following properties: 


1. The secret keys are vectors of elements in a ring R with operations (-+, -) 
and the public keys consist of elements in a group G. 

2. If (pki, sk,) and (pk2, sk2) are valid keypairs of JI and a,b € R, then askı + 
bsk2 is also a valid secret key of I, with a corresponding public key that 
can be efficiently computed from pk, pk2, a,b. For the schemes we present, 
this public key is computed as pk? - pkb. 


We note that many existing cryptosystems have this property, or can be made 

to have this property with trivial modifications, including [8], [9], and [74]. 
The trusted master authority in an IBE will then choose n pairs of (pki, ski) 

(i = 1,...,n) using the key generation algorithm of J, publish the n pk; values, 
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and keep secret the corresponding n sk,’s. Each identity is mapped to a vector 
id,...id, in R” (we abuse terminology slightly here since R is only required to be 
a ring and not a field, but we will still call these “vectors” ). The secret key for 
the identity is computed as a coordinate-wise linear combination of the vectors 
8k,,..., Skn, with coefficients id,,..., idn respectively, i.e. 


n 


SKrp := X (sk: - idi) 


i=l 


where all additions take place in R. 

Anyone can compute the matching public key PK yp using the key homomor- 
phism and the published pk; values. Since by the key homomorphism (PK yp, 
SK yp) is still a valid key pair for the original PKE, encryption and decryption 
can function identically to before. The encryptor simply runs the encryption 
algorithm for I using PKyp, and the decryptor runs the decryption algorithm 
for IT using SKrp. 

We refer to the combination of a PKE scheme with this homomorphic property 
over keys and a mapping for identities as having the linear key homomorphism 
and identity map compatibility properties. To prove security for the resulting 
bounded-collusion IBE construction, one can intuitively see that we need the 
map taking identities to vectors to produce linearly independent outputs for any 
distinct t + 1 identities. This is required to ensure that any t colluding users 
will not be able to compute a secret key for another user as a known linear 
combination of their own secret keys. To obtain our full security proof, we define 
an algebraic property of the PKE scheme in combination with the identity map, 
called the linear hash proof property, which encompasses this requirement on any 
t+1 images of the map and more. The definition of this property is inspired by the 
paradigm of hash proof systems (introduced by Cramer and Shoup [14]), though 
it differs from this in many ways. We define valid and invalid ciphertexts for our 
systems, where valid ciphertexts decrypt properly and invalid ciphertexts should 
decrypt randomly over the set of many secret keys corresponding to a single 
public key. We require that valid and invalid ciphertexts are computationally 
indistinguishable. So far this is quite similar to the previous uses of hash proof 
systems. However, the identity-based setting introduces a further challenge in 
proving security by changing to an invalid ciphertext, since now the adversary’s 
view additionally includes the secret keys that it may request for other identities. 
Hence, we must prove that an invalid ciphertext decrypts randomly over the 
subset of secret keys that are consistent not only with the public keys, but also 
with the received secret keys. 

Controlling the behavior over this set of consistent keys in the QR-based set- 
ting is particularly challenging, where the mathematical analysis is quite subtle 
due to the fact that secret keys must be treated as integers in a bounded range 
while public keys are elements of a subgroup of Zy. To prove the linear hash 
proof property for our QR-based system, we employ technical bounds concerning 
the intersection of a shifted lattice in Z” with a “bounding box” of elements of 
Z” whose coordinates all lie within a specified finite range. 


Bounded-Collusion IBE from Key Homomorphism 569 


1.2 Other Related Work 


In addition to those referenced above, constructions of IBE schemes in the stan- 
dard model in the bilinear setting were also provided by Gentry [20] under the 
g-ABHDE assumption, and by Waters under the bilinear Diffie-Hellman and 
decisional linear assumptions. Another construction based on quadratic residu- 
osity in the random oracle model was provided by Boneh, Gentry, and Hamburg 
[7]. Leakage-resilient IBE schemes in various models have also been constructed, 
for example by Alwen, Dodis, Naor, Segev, Walfish, and Wichs 2], by Brak- 
erski, Kalai, Katz, and Vaikuntanathan [10], and by Lewko, Rouselakis, and 
Waters [26]. 

The property we require for our PKE schemes in addition to key homomor- 
phism is a variant of the structure of hash proof systems, which were first in- 
troduced by Cramer and Shoup as a paradigm for proving CCA security of 
PKE schemes [14]. Hash proof systems have recently been used in the context 
of leakage-resilience as well ([28], for example), extending to the identity-based 
setting in [2]. We note that the primitive of identity-based hash proof systems 
introduced in [2] takes a different direction than our work, and the instantiation 
they provide from the quadratic residuosity assumption relies on the random 
oracle model. 

The relaxation to bounded collusion resistance has also been well-studied in 
the context of broadcast encryption and revocation schemes, dating back to 
the introduction of broadcast encryption by Fiat and Naor [I7]. This work and 
several follow up works employed combinatorial techniques [B1/32/33/18[25/19}. 
Another combinatorial approach, the subset cover framework, was introduced by 
Naor, Naor, and Lopspeich [27] to build a revocation scheme. In this framework, 
users are associated with subsets of keys. The trusted system designer can then 
broadcast an encrypted message by selecting a family of subsets which covers all 
the desired recipients and none of the undesired ones. An improvement to the 
NNL scheme was later given by Halevy and Shamir [24], and these techniques 
were then extended to the public key setting by Dodis and Fazio [I5]. 


2 Preliminaries 


2.1 IND-CPA Security for Bounded-Collusion IBE 


We define IND-CPA security for bounded-collusion IBE in terms of the follow- 
ing game between a challenger and an attacker. We let t denote our threshold 
parameter for collusion resistance. The game proceeds in phases: 


Setup Phase. The challenger runs the setup algorithm to produce the public 
parameters and master secret key. It gives the public parameters to the attacker. 


Query Phase I. The challenger initializes a counter to be 0. The attacker may 
then submit key queries for various identities. In response to a key query, the 
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challenger increments its counter. If the resulting counter value is < t, the chal- 
lenger generates a secret key for the requested identity by running the key gen- 
eration algorithm. It gives the secret key to the attacker. If the counter value is 
> t, it does not respond to the query. 


Challenge Phase. The attacker specifies messages mo, mı and an identity I D* 
that was not queried in the preceding query phase. The challenger chooses a ran- 
dom bit b € {0,1}, encrypts mą to identity I D* using the encryption algorithm, 
and gives the ciphertext to the attacker. 


Query Phase II. The attacker may again submit key queries for various identities 
not equal to ID*, and the challenger will respond as in the first query phase. We 
note that the same counter is employed, so that only t total queries in the game 
are answered with secret keys. 


Guess. The attacker outputs a guess b’ for b. 


We define the advantage of an attacker A in the above game to be Advy = 
|Pr[b =b] — 4|. We say a bounded-collusion IBE system with parameter t is 
secure if any PPT attacker A has only a negligible advantage in this game. 


2.2 Complexity Assumption 


We formally state the QR assumption. We let A denote the security parameter. 


Quadratic Residuosity Assumption. We let N = pq where p,q are random A-bit 
primes. We require p,q = 3 (mod 4), i.e. N is a Blum integer. We let Jy denote 
the elements of Z3, with Jacobi symbol equal to 1, and we let QR y denote the 
set of quadratic residues modulo N. Both of these are multiplicative subgroups 
of Zi, with orders ow) and oN) respectively. We note that oN) is odd, and 
that —1 is an element of Jy, but is not a square modulo N. As a consequence, 
Jn is isomorphic to {+1,—1} x QRy. We let u denote an element of QRy 
chosen uniformly at random, and h denote an element of Jy chosen uniformly 
at random. For any algorithm A, we define the advantage of A against the QR 
problem to be: 


Adv |Pr [A(N, u) = 1] — Pr [A(N,h) = 1]]. 


We further restrict our choice of N to values such that QR y is cyclic. We note 
that this is satisfied when p,q are strong primes, meaning p = 2p’ + 1,q = 
2q'+1, where p, q, p’, 7’ are all distinct odd primes. This restriction was previously 
imposed in [14], where they note that this restricted version implies the usual 
formulation of the quadratic residuosity assumption if one additionally assumes 
that strong primes are sufficiently dense. We say that the QR assumption holds 
if for all PPT A, Adv! is negligible in A. 

Furthermore, we note that this definition is equivalent to one in which A 
receives a random element h of Jy\QRy instead of Jy. 
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2.3 Mapping Identities to Linearly Independent Vectors 


To employ our strategy of transforming PKE schemes with homomorphic prop- 
erties over keys into IBE schemes with polynomial collusion resistance, we first 
need methods for efficiently mapping identities to linearly independent vectors 
over various fields. This can be done using generating matrices for the Reed- 
Solomon codes over Zp and dual BCH codes over Z2. The proofs of the following 
lemmas can be found in the full version. 


Lemma 1. For any prime p and any t+ 1 < p, there exists an efficiently- 
computable mapping f : Zp > Ya such that for any distinct £1, £2,...%¢41 E Zp, 
the vectors f(x1), f(@2),...f(xe41) are linearly independent. 


Lemma 2. For any positive integer k and any t+1 < 2*, there exists an 
efficiently-computable mapping f : {0,1}* > {0,1}"+)* such that for any dis- 
tinct £1, £2, ...t141 € {0,1}*, the vectors f(x1), f(x2),...f(xe41) are linearly in- 
dependent over Zə. 


3 From PKE to Bounded Collusion IBE: General 
Conditions and Construction 


We start with a public key scheme and an efficiently computable mapping f 
on identities that jointly have the following useful properties. We separate the 
public keys of the PKE into public parameters (distributed independently of the 
secret key) and user-specific data; the latter is referred to as the “public key”. 


3.1 Linear Key Homomorphism 


We say a PKE has linear key homomorphism if the following requirements hold. 
First, its secret keys are generated randomly from d-tuples of a ring R for some 
positive integer d, with a distribution that is independent and uniform in each 
coordinate over some subset R’ of R. Second, starting with any two secret keys 
sky, sko each in R? and any r1,r2 € R, the component-wise R-linear combination 
formed by r1sk,+1r2sk2 also functions as a secret key, with a corresponding public 
key that can be computed efficiently from r1, r2 and the public keys pk; and pk 
of sk; and skz respectively, fixing the same public parameters. We note that 
r sk, +1rgskg may not have all entries in R’, but it should still function properly 
as a key. 


3.2 Identity Map Compatibility 


We say the identity mapping f is compatible with a PKE scheme with linear 
key homomorphism if f maps identities into n-tuples of elements of R. Letting 
I denote the number of identities, the action of f can be represented by a I x n 
matrix with entries in R. We denote this matrix by F and its rows by fi,..., fr. 
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3.3 Linear Hash Proof Property 


We now define the strongest property we require, which we call the linear hash 
proof property. This property is inspired by the paradigm of hash proof systems, 
but we deviate from that paradigm in several respects. In hash proof systems, 
a single public key corresponds to many possible secret keys. There are two en- 
cryption algorithms: a valid one and an invalid one. Valid ciphertexts decrypt 
properly when one uses any of the secret keys associated to the public key, while 
invalid ciphertexts decrypt differently when different secret keys are used. Our 
linear hash proof property will consider several public keys at once, each corre- 
sponding to a set of many possible secret keys. The adversary will be given these 
public keys, along with some linear combinations of fixed secret keys correspond- 
ing to the public keys. We will also have valid and invalid encryption algorithms. 
Our valid ciphertexts will behave properly. When an invalid ciphertext is formed 
for a public key corresponding to a linear combination of the secret keys that 
is independent of the revealed combinations, the invalid ciphertext will decrypt 
“randomly” when one chooses a random key from the set of secret keys that are 
consistent with the adversary’s view. 

To define this property more formally, we first need to define some additional 
notation. We consider a PKE scheme with linear key homomorphism which 
comes equipped with a compatible identity map f and an additional algorithm 
InvalidEncrypt which takes in a message and a secret key sk and outputs a ci- 
phertext (note that the invalid encryption algorithm does not necessarily need to 
be efficient). The regular and invalid encryption algorithms produce two distri- 
butions of ciphertexts. We call these valid and invalid ciphertexts. Correctness 
of decryption must hold for valid ciphertexts. 

We let (ski, pki), (ska, pko),..., (skn, pkn) be n randomly generated key pairs, 
where all of ski,..., sk, are d-tuples in a ring R (here we assume that the key 
generation algorithm chooses R,d and then generates a key pair. We fix R and 
then run the rest of the algorithm independently n times to produce the n key 
pairs). We define S to be the n x d matrix with entries in R whose it” row 
contains ski. 

Fix any t+ 1 distinct rows of the matrix of identity vectors F, denoted by 
Fasea Fisg.. We let skrp,,., denote the secret key fi,,,-S and Pkrp,,,, denote 
the corresponding public key (computed via the key homomorphism). We let 
Kerr(fi,,-.--,f%,) denote the kernel of the t x n submatrix of F formed by 
these rows; that is, it consists of the vectors v € R” such that f;,-v = 0 for all 
j from 1 to t. 

Now we consider the set of possible secret key matrices given the public 
and secret key information available to an adversary who has queried identi- 
ties i1, ..., i. We let W denote the set of matrices in R”? whose columns belong 


to Kerr(fi,,---,fi,) and whose rows w; satisfy that sk; + w; has the same 
public key as sk; for all i. Since W’s columns are orthogonal to the identity 
vectors f;,,...,f%,, adding an element of W to S does not change any of the 


secret keys fi; S. Furthermore, by construction, adding an element of W to S 
does not change the public keys associated with the scheme. 
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We define the subset S$ of R"*@ to be the set of all matrices in S + W := 
{S+Wo|Wo € W}, intersected with the set of all matrices of n secret keys that 
can be generated by the key generation algorithm (i.e. those with components in 
R’). Intuitively, $ is the set of all possible n x d secret key matrices that are “con- 
sistent” with the n public keys pk1, . . . , pkn and the t secret keys f;,:5,..., fiS. 
In other words, after seeing these values, even an information-theoretic adversary 
cannot determine S uniquely - only the set S can be determined. 

We say that a PKE scheme with linear key homomorphism is a linear 
hash proof system with respect to the compatible map f if the following 
two requirements are satisfied. We refer to these requirements as uniform decryp- 
tion of invalid ciphertexts and computational indistinguishability of valid/invalid 
ciphertezts. 


Uniform Decryption of Invalid Ciphertezts. With all but negligible probability 
over the choice of sk ,pk1,...,skn,pkpy, and the random coins of the invalid 
encryption algorithm, for any choice of distinct rows fj,,...,fi,,, of F, an 
invalid ciphertext encrypted to pkr Di,,, must decrypt to a message distributed 
negligibly close to uniform over the message space when decrypted with a secret 
key chosen at random from f;,,, - Š. More precisely, an element of $' is chosen 
uniformly at random, and the resulting matrix is multiplied on the left by f;,,, 
to produce the secret key. 


Computational Indistinguishability of Valid/Invalid Ciphertexts. Second, we re- 
quire valid and invalid ciphertexts are computationally indistinguishable in the 
following sense. For any fixed (distinct) fi,,..., fi,,,, we consider the following 
game between a challenger and an attacker A: 

Gamenp: The challenger starts by sampling (ski, pki),..., (skn, pkn) as above, 
and gives the attacker the public parameters and pk,...,pk,. The attacker may 
adaptively choose distinct rows fj,,...,fi,,, in F in any order it likes. (For 
convenience, we let f;,,, always denote the vector that will be encrypted under, 
but we note that this may be chosen before some of the other f;’s.) Upon setting 
an fi, for j A t+ 1, the attacker receives fi, + S. When it sets fi,,,, it also 
chooses a message m. At this point, the challenger flips a coin 6 € {0,1}, and 
encrypts m to the public key corresponding to f;,,, © S as follows. We let pken 
denote the public key corresponding to f;,,, © S. If 8 = 0, it calls Encrypt with 
m, pken. If p = 1, it calls InvalidEncrypt with m, fi,,, © S. It gives the resulting 
ciphertext to the attacker, who produces a guess 8’ for 8. 

We denote the advantage of the attacker by Adv! = IPIS = 8'] — z| . We 


require that Adv’? be negligible for all PPT attackers A. 


3.4 Construction 


Given a PKE scheme (KeyGen, Encrypt, Decrypt) and an identity mapping f 
having the properties defined above, we now construct a bounded-collusion IBE 
scheme. We let t denote our collusion parameter, and n will be the dimension of 
the image of f. 
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Setup(A) — PP, MSK. The setup algorithm for the IBE scheme calls the key gen- 
eration algorithm of the PKE scheme to generate n random sky, pk,..., skn, pkn 
pairs, sharing the same public parameters. The public parameters PP of the IBE 
scheme are defined to be these shared public parameters as well as pk,..., pkn. 
The master secret key MSK is the collection of secret keys sk1,..., skp. 


KeyGen(ID,MSK) > SKrp. The key generation algorithm takes an identity 
in the domain of f and first maps it into R” as f(ID) = (idi,...,id,). It then 
computes SK;p as an R-linear combination of sk,,...,sk,, with coefficients 
idı, TEE idn: SKrp = Yai idiski. 


Encrypt(m, PP, ID) —> CT. The encryption algorithm takes in a message in the 
message space of the PKE scheme. From the public parameters PP, it computes a 
public key corresponding to SKzp using the linear key homomorphism property 
(we note that the mapping f is known and efficiently computable). It then runs 
the PKE encryption algorithm on m with this public key to produce CT. 


Decrypt(CT,SKrp) => m. The decryption algorithm runs the decryption algo- 
rithm of the PKE, using SK;p as the secret key. 


3.5 Security 


Theorem 1. When a PKE scheme (KeyGen, Encrypt, Decrypt) with linear 
key homomorphism and a compatible identity mapping f satisfy the linear hash 
proof property, then the construction defined in Section|3.4]is a secure bounded- 
collusion IBE scheme with collusion parameter t. 


Proof. We first change from the real security game defined in Section B.I] to 
a new Game’ in which the challenger calls the invalid encryption algorithm to 
form an invalid ciphertext. We argue that if the adversary’s advantage changes 
by a non-negligible amount, this violates the computational indistinguishabil- 
ity of valid/invalid ciphertexts. To see this, we consider a PPT adversary A 
whose advantage changes non-negligibly. We will construct a PPT adversary A’ 
against Gamerp. The challenger for Game;,, gives A’ the public parameters and 
pkı,..., pkn, which A’ forwards to A. When A requests a secret key for an iden- 
tity corresponding to f;,, A’ can forward fi, to its challenger and obtain the 
corresponding secret key. When A declares mo, mı and some J D* corresponding 
to fi,,,, A’ chooses a random bit b € {0,1} and sends mp, fi,,, to its chal- 
lenger. It receives a ciphertext encrypting ma, which it forwards to A. We note 
here that the t+ 1 distinct identities chosen by A correspond to distinct rows of 
F. If the challenger for A’ is calling the regular encryption algorithm, then A’ 
has properly simulated the real security game for A. If it is calling the invalid 
encryption algorithm, then A’ has properly simulated the new game, Game’. 
Hence, if A has a non-negligible change in advantage, A’ can leverage this to 
obtain a non-negligible advantage in Gamenp. 
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In Game’, we argue that information-theoretically, the attacker’s advantage 
must be negligible. We observe that in our definition of the linear hash proof 
property, the subset S of R”*? is precisely the subset of possible MSK’s that 
are consistent with the public parameters and requested secret keys that the 
attacker receives in the game, and each of these is equally likely. Since the invalid 
ciphertext decrypts to an essentially random message over this set (endowed with 
the uniform distribution), the attacker cannot have a non-negligible advantage 
in distinguishing the message. 


4 QR-Based Construction 


We now present a PKE scheme with linear key homomorphism and a compatible 
identity mapping f such that this is a linear hash proof system with respect to 
f under the quadratic residuosity assumption. 


QR-based PKE Construction. We define the message space to be {—1, 1}. The 
public parameters of the scheme are a Blum integer N = pq, where primes 
p,q = 3 mod 4 and QRy is cyclic, and an element g that is a random quadratic 
residue modulo N. Our public keys will be elements of Zy, while our secret keys 
are elements of the ring R := Z. We define the subset R’ to be [p(N)]. We will 
later provide bounds for appropriate settings of p(N). 


— Gen(1>): The generation algorithm chooses an element sk uniformly at ran- 
dom in [p(N)]. This is the secret key. It then calculates the public key as 
pk = g%. 

— Encpk(m): The encryption algorithm chooses an odd r € [N?] uniformly at 
random, and calculates Enc(m) = (g",m- pk”). 

— Decsp(c1, 2): The decryption algorithm computes m = c3 - (cf*)7!. 


We additionally define the invalid encryption algorithm: 


— InvalidEncs,(m): The invalid encryption algorithm chooses a random h € 
Jn\QRy (i.e. a random non-square). It produces the invalid ciphertext as 
h,m- h®. 


Key Homomorphism. Considering N, g as global parameters and only pk = g°* 
as the public key, we have homomorphism over keys through multiplication and 
exponentiation in G for public keys and arithmetic over the integers for secret 
keys. 

For secret keys sk), sk2 € Z and integers a,b € Z, we can form the secret key 
sks := ask, + bsky and corresponding public key pk3 = pk? - pk} in G. 


4.1 Compatible Mapping and Resulting IBE Construction 


Our compatible map f is obtained from Lemma] (Section[2.3). We may assume 
that our identities are hashed to {0,1}* for some k using a collision-resistant 
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hash function, so they are in the domain of f. The image of each identity under 
f is a vector with 0,1 entries of length n = k(t + 1), where t is our collusion 
parameter. For every t+ 1 distinct elements of {0,1}*, their images under f are 
linearly independent (over Zə as well as Q). 

A formal description of our construction follows. This is an instance of the 
general construction in Section[3.4] but we state it explicitly here for the reader’s 
convenience. We assume that messages to be encrypted are elements of {—1, +1}, 
and identities are elements of {0,1}. For each identity ID, we let IDT denote 
the row vector of length n over {0,1} obtained by our mapping from {0,1}* to 
binary vectors of length n. 


Setup. The setup algorithm chooses a Blum integer N such that QRy is cyclic 
and a random element g E€ QRy. It then generates n key pairs of the PKE 
((pk1, 8k), (pk2, sk2), ...(pkn, skn )) using the common g, and publishes the public 
keys (along with N, g) as the public parameters. The master secret key consists 
of the corresponding secret keys, sk ,...,sk,. These form an n x 1 vector S with 
entries in [o(N)] (the it” component of S is equal to sk; for i =1...n). 


KeyGen(ID). The key generation algorithm receives an ID € {0,1}*. By 
Lemma B] (Section 2.3), we then have a mapping f that takes this ID to a vector 
(idı, ida, ...idn), such that the vectors corresponding to t + 1 different ID’s are 
linearly independent. The secret key for ID will be an element of Z, which is 
computed as a linear combination of the values sk,,...,sk,, with coefficients 


id,,...,id, respectively. We express this as SKyp := X (ski - idi), where the 


i=1 

sum is taken over Z. Since the mapping f provided in Section [2.3] produces vec- 

tors (id1,...,idn) with 0,1 entries, the value of SK;p is at most p(N)n. Since n 

will be much less than p(NV), this will require roughly log p(N) bits to represent. 
n 


Encrypt(ID,m,PP). We let PKzp := [[@%*). Anyone can compute this using 


the multiplicative key homomorphism cad the published pk; values. Since by the 
key homomorphism (PKzrp, SKqrp) is still a valid keypair for the original PKE, 
encryption and decryption can function as for the PKE. In other words, the 
encryptor runs the encryption algorithm for the PKE scheme with PKyp as the 
public key to produce the ciphertext CT. 

Note that for ciphertexts, we now have 


EncPK,p (m) = (g",m ` ((PKrDyY)) 
= (s m: Tose") = (s m- ie i 


All arithmetic here takes place modulo N. 


por where 


This can alternately be expressed as: Encpx,,(m) = (r m:g 
S = (sk;)nx1 is a vector over Z containing the n PKE secret keys of the master 


secret key. 
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Decrypt(CT,SK;p). The decryption algorithm runs the decryption algorithm 
of the PKE with SKyp as the secret key. 


4.2 Security of the IBE 


We now prove security of IBE scheme up to t collusions. This will follow from 
Theorem [I] and the theorem below. 


Theorem 2. Under the QR assumption, the PKE construction in Section|[4] is a 
linear hash proof system with respect to f when p(N) is sufficiently large. When 
log(N) = R(n? logn), p(N) = N® for some constant £ suffices. 


We note that when p(N) = N*%, our secret keys are of size O(log N) = O(A). We 
prove this theorem in two lemmas. 


Lemma 3. Under the QR assumption, computational indistinguishability of valid 
and invalid ciphertezts holds. 


Proof. We suppose there exists a PPT adversary A with non-negligible advan- 
tage in Gamen,. We will create a PPT algorithm 6 with non-negligible advan- 
tage against the QR assumption. We simplify/abuse notation a bit by letting 
fi,---, ft41 denote the distinct rows of f that are chosen adaptively by A during 
the course of the game (these were formerly called fj,,..., firi) 

B is given (N, h), where N is a Blum integer such that QR, is cyclic and h is 
either a random element of Jy\QR,y or a random element of QR y. Crucially, B 
does not know the factorization of N. B sets g to be a random element of QRy. 

It chooses an n x 1 vector S = (sk;), whose entries are chosen uniformly 
at random from [p(N)]. For each i from 1 to n, the i” entry of S is denoted 
by ski. It computes pk; = g*** mod N and gives the public parameters PP = 
(N, g, pki,...,pkn) to A. We note that B knows the MSK = S, so it can compute 
fı- S,..., ft- S and give these to A whenever A chooses the vectors f1,..., fe- 

At some point, A declares a message m and a vector ft+ı corresponding to 


identity [D*. B encrypts m using the following ciphertext: (h, m: pae . 
We consider two cases, depending on the distribution of h. 


Case 1: h is random in QRy. When h is a random square modulo N, we claim 
that the ciphertext is properly distributed as a valid ciphertext. More precisely, 
we claim that the distribution of h and the distribution of g” for a random odd 
r € [N?] are negligibly close. This follows from the fact that QRy is cyclic of 
order oN) and the reduction of a randomly chosen odd r € [N?] modulo oN) 
will be distributed negligibly close to uniform. 


Case 2: h is random in Jy\QRy. In this case, B has followed the specification 
of the invalid encryption algorithm. 

Thus, if A has a non-negligible advantage in distinguishing between valid and 
invalid ciphertexts, then 6 can leverage A to obtain non-negligible advantage 
against the QR assumption. 
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Lemma 4. Uniform decryption of invalid ciphertezts holds when p(N) is suf- 
ficiently large. When log(N) = Q(n?logn), p(N) = N! for some constant £ 


suffices. 


Proof. We choose S with uniformly random entries in [p(V)]. We then fix any 
t+ 1 distinct rows of F, denoted by f1,..., f+41. We must argue that the value 
of f141:S modulo 2 is negligibly close to uniform, conditioned on f;-S,..., fS 
and $ modulo = To see why this is an equivalent statement of the uniform 
decryption of invalid ciphertexts property for our construction, note that the 
decryption of an invalid ciphertext is computed as follows. We let sk denote the 
secret key the ciphertext was generated with, and sk* denote another secret key 
for the same public key used for decryption: Dec(sk*, (h, mh*")) = m(—1)8*—s*" , 
since sk = sk* mod ¢(N)/4 in order to both have the same public key. If we think 
of S as fixed and S as the set of vectors with entries in [p(N)] that yield the 
same values of fı- S,..., f+- S and S modulo oN) | we can restate our goal as 
showing that the distribution of ft+ı - S” mod 2 is negligibly close to uniform, 
where S” is chosen uniformly at random from S. 

We know by Lemma[2]that the vectors f1,..., f¢41 are linearly independent 
as vectors over Z2. This implies that these vectors are linearly independent as 
vectors over Q as well. We let Kero(fi,..., ft) denote the (n — t)-dimensional 
kernel of these vectors as a subspace of Q”. 

Our strategy is to prove that this space contains a vector p with integer entries 
that is not orthogonal to f:+ı modulo 2. Then, for every S’ in S+W, S’+ oN) 5 
is also in S+W. Here we are using the notation from Section] where we defined 
W. In this instance, S+ W is the set of vectors yielding the same values as S$ for 
fı- S,..., ft- S and S modulo oN) | Š is then the intersection of S + W with 
the set of vectors having all of their entries in [p(V)]. 

To complete the argument, we need to prove that for most elements of S’ € 9 
(all but a negligible proportion), S” + IN) p will also be in S (i.e. have entries 
in [p(NV)]). This will follow from showing that there exists a p with reasonably 
bounded entries, and also that the set $ contains mostly vectors whose entries 
stay a bit away from the boundaries of the region [p(V)]. 

We will use the following lemmas. The proof the second can be found in the 
full version. 


Lemma 5. Let A be atx n matrix of rank t over Q with entries in {0,1}. Then 
there exists a basis for the kernel of A consisting of vectors with integral entries 
all bounded by nett. 


Proof. This is an easy consequence of Theorem 2 in [3], which implies the exis- 
tence of a basis with entries all bounded in absolute value by y/det( AAT). We 
note that AAT is a t x t matrix with integral entries between 0 and n. Dividing 
each row by n, we obtain a matrix with rational entries between 0 and 1, and 
can then apply Hadamard’s bound [23] to conclude that the determinant of this 
rational matrix has absolute value at most t?. Thus, the determinant of AAT 
has absolute value at most n’t?. Applying Theorem 2 in [3], the lemma follows. 
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Lemma 6. We suppose that M isd xn matrix with integral entries all of abso- 
lute value at most B and rank d over Q. Then there exists another d x n matrix 
M' with integral entries of absolute value at most 2¢—'B that has the same rows- 
pan as M over Q and furthermore remains rank d when its entries are reduced 
modulo 2. 


Combining these two lemmas, we may conclude that there exists a basis for 
Kero(fi,..-,f¢) with integral entries all having absolute value at most C := 
2r-t-1n2t7 that remains of rank n — t when reduced modulo 2. Now, if all of 
these basis vectors are orthogonal to f+; modulo 2, then these form a (n — t)- 
dimensional space that is contained in the kernel of the (t+ 1)-dimensional space 
generated by f1,...,f¢, ft41 in Z3. This is a contradiction. Thus, at least one 
of the basis vectors is not orthogonal to f:+1 modulo 2. Since it is orthogonal 
to fı,..., fı over Q and has integral entries of absolute value at most C, this is 
our desired p. 
Now, the set of vectors S' can be described as the intersection of the set 
S+ OO) kersa.. fe) 

with the set of vectors with coordinates all in [p(N)], where Kerz(fi,..., ft) 
denotes the vectors in Kerg(fi,..., f+) with integral entries. Since we have a 
bound C on the size of entries an integer basis for the kernel, we can argue 
that if the coordinates of S are sufficiently bounded away from 0 and p(N), 
then there will be many vectors in $, negligibly few of which themselves have 
entries outside of (2c, p(w) — 2). Both this bound and the probability 
that S is indeed sufficiently bounded away from 0 and p(N) depend only on the 
relationship between n and p(N). In the full version of this paper, we prove the 
following lemma: 


Lemma 7. Withp(N),n,p,S, and Š defined as above, whenlog N = Q(n? logn), 
we can set p(N) = N° for some constant £ so that the fraction of S! € S such that 
St + oN) y is not also in Š is negligible with all but negligible probability over the 
choice of S. 

Thus, ignoring negligible factors, we can consider as partitioned into pairs 
of the form S’ and S$” + LN). For each S’, the values of fi+ı - S and fti: 


(s' + Mp) modulo 2 are different. Thus, the distribution of f+1ı -S mod 2 


over S’ € Š is sufficiently close to uniform. 


5 Open Problems 


It remains to find additional constructions within this framework based on other 
assumptions; in particular, lattice-based constructions may be possible. It would 
also be interesting to extend this framework to accommodate stronger secu- 
rity requirements, such as CCA-security. Finally, constructing a fully collusion- 
resistant IBE from the QR assumption in the standard model remains a chal- 
lenging open problem. 
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Abstract. We propose a general construction of deterministic encryp- 
tion schemes that unifies prior work and gives novel schemes. Specifically, 
its instantiations provide: 


— Aconstruction from any trapdoor function that has sufficiently many 
hardcore bits. 

— Aconstruction that provides “bounded” multi-message security from 
lossy trapdoor functions. 


The security proofs for these schemes are enabled by three tools that are 
of broader interest: 


— A weaker and more precise sufficient condition for semantic security 
on a high-entropy message distribution. Namely, we show that to es- 
tablish semantic security on a distribution M of messages, it suffices 
to establish indistinguishability for all conditional distribution ME, 
where E is an event of probability at least 1/4. (Prior work required 
indistinguishability on all distributions of a given entropy.) 

— A result about computational entropy of conditional distributions. 
Namely, we show that conditioning on an event E of probability p 
reduces the quality of computational entropy by a factor of p and its 
quantity by log, 1/p. 

— A generalization of leftover hash lemma to correlated distributions. 

We also extend our result about computational entropy to the average 
case, which is useful in reasoning about leakage-resilient cryptography: 


leaking À bits of information reduces the quality of computational en- 
tropy by a factor of 2” and its quantity by A. 


1 Introduction 


Public-key cryptosystems require randomness: indeed, if the encryption oper- 
ation is deterministic, the adversary can simply use the public key to verify 
that the ciphertext c corresponds to its guess of the plaintext m by encrypt- 
ing m. However, such an attack requires the adversary to have a reasonably 
likely guess for m in the first place. Recent results on deterministic public-key 
encryption (DE) (building on work in the information-theoretic symmetric-key 
setting [38]17]14]) have studied how to achieve security when the randomness 
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comes only from m itself [3]5[7J27J8]40]. DE has a number of practical applica- 
tions, such as efficient search on encrypted data and securing legacy protocols 
(cf. [3]). It is also interesting from a foundational standpoint; indeed, its study 
has proven useful in other contexts: Bellare et al. [4] showed how it extends to 
a notion of “hedged” public-key encryption that reduces dependence on exter- 
nal randomness for probabilistic encryption more generally, and Dent et al. [13] 
adapted its notion of privacy to a notion of confidentiality for digital signatures. 

However, our current understanding of DE is somewhat lacking. The construc- 
tions of [3[5[7[27], as well as their analysis techniques, are rather disparate, and 
some natural questions arise from them. Namely, does the scheme of [5] inher- 
ently require using the Goldreich-Levin hardcore bit? Can it be made to work 
with trapdoor functions rather than permutations? Is the single-message security 
achieved by an inherent limitation of standard model (i.e., non-random- 
oracle) schemes? In this work our main goal is to provide a unified framework 
for the construction of DE and to shed light on these questions. 


1.1 Our Results 


A SCHEME BASED ON TRAPDOOR FUNCTIONS. We propose a general Encrypt- 
with-Hardcore (EwHCore) construction of DE from trapdoor functions (TDFs), 
which generalizes the basic idea behind the schemes of [3J5] and leads to a unified 
framework for the construction of DE. Let f be a TDF with a hardcore function 
hc, and let € be any probabilistic public-key encryption algorithm. Our scheme 
encrypts an input message x by computing y = f(x) and then encrypting y 
using E with hc(x) as the coins; that is, the encryption of x is E(f(a);hc(a)). 

Intuitively, this scheme requires that the output of hc be sufficiently long to 
provide enough random coins for € (in fact, it need only be sufficiently long to be 
used as a seed for a psuedorandom generator), and that it not reveal any partial 
information about x (because € does not necessarily protect the privacy of its 
random coins). There are two nontrivial technical steps needed to make intuition 
precise. First, we define a condition required of hc (which we call “robustness” ) 
and show that it is sufficient for security of the resulting DE. Second, through 
a computational entropy argument, we show how to make any sufficiently long 
hc robust by applying a randomness extractor. 

This general scheme admits a number of instantiations depending of f and 
hc. For example, when f is any trapdoor function and hc is a random oracle 
(RO), we obtain the construction of 3|. When f is an iterated trapdoor per- 
mutation (TDP) and hc is a collection Goldreich-Levin (GL) bits extracted 
at each iteration, we obtain the construction of [5]. When f is a lossy trapdoor 
function (LTDF) [85] and hc is a pairwise-independent hash, we get a variant 
of the construction of [7] (which is less efficient but has a more straightforward 
analysis). We also obtain a variant of the construction of Hemenway et al. [27] 


1 Technically, this construction does not even need a TDF because of the random 
oracle model; however, it may be prudent to use a TDF because then it seems more 
likely that the instantiation of the random oracle will be secure as it may be hardcore 
for the TDF. 
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under the same assumption as they use (see Section [5.2]for details). Note that in 
all but the last of these cases, the hardcore function is already robust (without 
requiring an extractor), which shows that in prior work this notion played an 
implicit role. 

Moreover, this general scheme not only explains past constructions, but also 
gives us new ones. Specifically, if f is a trapdoor function with enough hardcore 
bits, we obtain: 


e DE that works on the uniform distribution of messages; 


e DE that works on any distribution of messages whose min-entropy is at 
most logarithmically smaller than maximum possible; 


e assuming sufficient hardness distinguishing the output of hc from uniform 
(so in particular of inverting f), DE that works on even-lower entropy mes- 
sage distributions. 


Prior results require more specific assumptions on the trapdoor function (such as 
assuming that it is a permutation or that it is lossy—both of which imply enough 
hardcore bits) in order to get constructions that work even just on the uniform 
distribution of messages. Furthermore, our results yield more efficient schemes 
(though sometimes under stronger assumptions) even in the permutation case, 
by avoiding iteration. 

Notably, we obtain the first DE scheme without random oracles based on the 
hardness of syndrome decoding using the Niederreiter trapdoor function [32], 
which was shown to have linearly many hardcore bits by Freeman et al. [I9] 
(and moreover to be “correlated input” secure) but is not known to be lossy. (A 
scheme in the random oracle model follows from [3].) Additionally, the RSA [87] 
and Paillier [34] trapdoor permutations have linearly many hardcore bits un- 
der certain computational assumptions (the “Small Solutions RSA” and 
“Bounded Computational Composite Residuosity” [9] assumptions respectively). 
Therefore, we can use these TDPs to instantiate our scheme efficiently under the 
same computational assumptions. Before our work, DE schemes from RSA and 
Paillier either required many iterations [5] or decisional assumptions that imply 
lossiness of these TDPs [30/19/77]. 


SECURITY FOR MULTIPLE MESSAGES: DEFINITION AND CONSTRUCTION. An im- 
portant caveat is that, as in [5J7], we can prove the above standard-model DE 
schemes secure only for the encryption of a single high-entropy plaintext, or, 
what was shown equivalent in [7], an unbounded number of messages drawn 
from a block source [10], where each subsequent message brings “fresh” entropy. 
On the other hand, the strongest and most practical security model for DE in- 
troduced by [8] considers the encryption of an unbounded number of plaintexts 
that have individual high entropy but may not have any conditional entropy. 
In order for EwHCore to achieve this, the hardcore function hc must also be ro- 
bust on correlated inputs. (A general study of correlated-input security for the 
case of hash functions rather than hardcore functions was concurrently initiated 
in [25].) In particular, it follows from the techniques of [3] that a RO hash satis- 
fies such a notion. This leads to a multi-message secure scheme in the RO model 
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(as obtained in B]). We thus have a large gap between what is (known to be) 
achievable with random oracles versus in the standard model. 

To help bridge this gap, we propose a notion of “q-bounded” security for DE, 
where up to q high-entropy but arbitrarily correlated messages may be encrypted 
under the same public key (whose size may depend polynomially on q). We feel 
that if one is limited to the standard model, this notion is useful. Indeed, it seems 
that the requirement of previous results in the standard model—that messages 
come from a block source—may be difficult to guarantee: all that’s needed to 
violate it is a single message that has low conditional entropy. Following [7], we 
also extend our security definition to unbounded multi-message security where 
messages are drawn from what we call a “gq-block source” (essentially, a block 
source where each “block” consists of q messages which may be arbitrarily cor- 
related but have individual high entropy); Theorem 4.2 of extends to show 
that g-bounded multi-message security and unbounded multi-message security 
for q-block sources are equivalent for a given min-entropy. 

Using our EwHCore construction and a generalization of the leftover hash 
lemma discussed below, we show g-bounded DE schemes (for long enough mes- 
sages), for any polynomial q, based on LTDFs losing an 1 — O(1/q) fraction 
of the input. It is known how to build such LTDFs from the decisional Diffie- 
Hellman [35], d-linear [I9], and decisional composite residuosity [719] assump- 
tions. 


1.2 Our Tools 


Our results are enabled by three tools that may be of more general applicability. 


A MORE PRECISE CONDITION FOR SECURITY OF DE. We revisit the definitional 
equivalences for DE proven by [5] and [7]. At a high level, they showed that 
the semantic security style definition for DE (called PRIV) introduced in the 
initial work of [3], which asks that a scheme hides all public-key independent] 
functions of messages drawn from some distribution is in some sense equivalent 
to an indistinguishability based notion for DE, which asks that it is hard to 
distinguish ciphertexts of messages drawn from one of two possible distributions. 
Notice that while PRIV can be meaningfully said to hold for a given message 
distribution, IND inherently talks of pairs of distributions. The works of [5I7] 
compensated for this by giving an equivalences in terms of min-entropy. That 
is, they showed that PRIV for all message distributions of min-entropy p is 
implied by indistinguishability with respect to all pairs of plaintext distributions 
of min-entropy slightly less than p. 

We demonstrate a more precise equivalence that, for a fixed distribution M, 
identifies a class of pairs of distributions such that if IND holds on those pairs, 
then PRIV holds on M. By re-examining the equivalence proof of [5], we show 
that PRIV on M is implied by IND on all pairs of “slightly induced” distributions 
of M | E, where E is an arbitrary event of probability at least 1/4. 


2 As shown in [3], the restriction to public-key independent functions is inherent here. 
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This first tool is needed to argue that “robustness” of hc is sufficient for 
security EwHCore (essentially, a robust hardcore function is one that remains 
hardcore on a slightly induced distributior?). 


CONDITIONAL COMPUTATIONAL ENTROPY. We investigate how conditioning re- 
duces computational entropy of a random variable X. Suppose you have a distri- 
bution that has computational entropy (such as the pair f(r), hc(r) for a random 
r). Suppose you condition that distribution on an event E of probability p. How 
much computational entropy is left? 

To make this question more precise, we should note that computational en- 
tropy is parameterized by quality (how distinguishable is X from a variable Z 
that has true entropy) and quantity (how much true entropy is there in Z). 

We prove an intuitively natural result: conditioning on an event of probability 
p reduces the quality of metric entropy by a factor of p and the quantity of metric 
entropy by log, 1/p (note that this means that the reduction in quantity and 
quality is the same, because the quantity of entropy is measured on log scale). 
Naturally, the answer becomes so simple only once the correct notion of entropy 
is in place. Our result holds for Metric* entropy (defined in [2[18]). This entropy 
is convertible (with some loss) to HILL entropy 26/2], which can then be used 
with randomness extractors to get pseudorandom bits. 

Our result improves the bounds of Dziembowski and Pietrzak [18] Lemma 3], 
where the loss in the quantity of entropy was related to its original quality. The 
use of metric entropy simplifies the analogous result of Reingold et al. [86] The- 
orem 1.3] for HILL entropy. (See for information on other related work 
Lemma 3.1] and [I] Lemma 16].) 

We use this result to show that randomness extractors can be used to convert 
a hardcore function into a robust one, through a computational entropy argu- 
ment for slightly induced distributions. The result is also applicable to leakage- 
resilient cryptography, as demonstrated by [18]. To make the result useful in 
more contexts, we also provide an average-case entropy formulation, which can 
be helpful in situations in which not all leakage is equally informative. For the 
information-theoretic case, it is known that leakage of À bits reduces the average 
entropy by at most à ({15) Lemma 2.2]). We show essentially the samd for the 
computational case: if À bits of information are leaked, then the amount of com- 
putational Metric” entropy decreases by at most A and its quality decreases by 
at most 2% (again, this entropy can be converted to HILL entropy and be used 


Q 


in randomness extractors [15[28)). 


(CROOKED) LEFTOVER HASH LEMMA FOR CORRELATED DISTRIBUTIONS. We 
show that the leftover hash lemma (LHL) Lemma 4.8], as well its generalized 
form [I5] Lemma 2.4] and the “crooked” LHL [16]) extend in a natural way to 


3 One could alternatively define robustness as one that remains hardcore on inputs of 
slightly lower entropy; however, in our proofs of robustness we would then need to 
go through an additional argument that distributions of lower entropy are induced 
by distributions of higher entropy. 

4 In case of randomized leakage, the information-theoretic result of [I5] Lemma 2.2(b)] 
gives better bounds. 
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“correlated” distributions. That is, suppose we have t random variables (sources) 
Xj 1,...,Xz, where each X; individually has high min-entropy but may be fully 
determined by the outcome of some other X; (though we assume X; # X; for all 
i Æ j). We would like to apply a hash function H such that H(X1),...,H(X:) 
is indistinguishable from t independent copies of the uniform distribution on the 
range of H (also over the choice of the key for H, which is made public). We 
show that this is the case assuming H is 2t-wise independent. (The standard 
LHL is thus t = 1; previously, Kiltz et al. showed this for t = 2.) Naturally, 
this requires the output size of H to be about a 1/t fraction of its input size, so 
there is enough entropy to extract. 


2 Preliminaries 


We omit standard cryptographic definitions (see the full version for precise defi- 
nitions ). The security parameter is denoted by k, and 1* denotes the string 
of k ones. Vectors are denoted in boldface, for example x. For convenience, we 
extend algorithmic notation to operate on each vector of inputs component- 
wise. For example, if A is an algorithm and x,y are vectors then z A(x, y) 
denotes that z[i] + A(x[i], y[i]) for all 1 < i < |x|. We write Px for the distribu- 
tion of random variable X and Px (x) for the probability that X puts on value 
x € X, ie., Px(x) = Pr[X = z]. Denote by |X| the size of the support of X, 
i.e., |X| = {x s.t. Px (x) > 0}|. We often identify X with Px when there is no 
danger of confusion. For a function f : ¥ — R, we denote the expectation of f 


over X by E f(X) Ë Exex f(t) Dyex Px(o)f (2). 

We will use the notions of min-entropy and average min-entropy (defined 
in [15]). For vector-valued X the min-entropy is the minimum of the compo- 
nents (see [3]5]). We use the standard notions of collision probability of X de- 
noted Col(X) and statistical distance of X and Y denoted A(X, Y). We denote 
the computational distance between two random variables X,Y with respect to 
a distinguisher D as 6?(X,Y). 

Dodis et al. [I5] Lemma 2.2] characterized the effect of auxiliary informa- 
tion on average min-entropy, namely, H.(A|(B,C)) > Hoo((A, B)|C) — |B| > 
Fio(AIC) — |B]. 

We will use extractors (defined in [33]) and average-case extractors (defined 
in Section 2.5]) and denote both by ext. 

For a (probabilistic) public-key encryption scheme, which is a triple of algo- 
rithms IT = (K,€,D) defined in the usual way, we will use the standard notion 
of IND-CPA security as defined in [24]. 

We use the standard definition of a lossy trapdoor function (LTDF) genera- 
tor (defined in [35]) which we denote as a pair LTDF = (F, F’) of algorithms. 


COMPUTATIONAL ENTROPY. We use the standard notion of HILL entropy as de- 
fined in [26]. Additionally, we use a notion known as “metric-star” entropy (this 
notion was used in [18[2i)): 
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Definition 1. A distribution X has Metric* entropy at least k, denoted 
Hee (X) > k if for all deterministic distinguishers D of size at most s, with 
outputs in (0, 1], there exists a distribution Y with Hx(Y) > k and ôP (X,Y) < €. 


Equivalence (with a loss in quality) between Metric* and HILL entropy was 
shown in [2] Theorem 5.2]. Extractors can be applied to distributions with com- 
putational entropy to obtain pseudorandom outputs. This is well-known for HILL 
entropy, but the only known way to extract from Metric* entropy is first to con- 
vert Metric* to HILL entropy by using [2| Theorem 5.2]. Conditional entropy 
has been extended to the computational case (for both HILL [28] and Metric 
entropy [21|). Conditional Metric* can be defined similarly, by making the dis- 
tinguisher deterministic with outputs in [0,1]. The Metric* to HILL conversion 
can be extended to the computational case as shown in Lemma 18], The- 
orem 2.7]. Average-case extractors can be used on distributions with conditional 
Metric* entropy by first using applying [21] Theorem 2.7]. 


2.1 Deterministic Encryption 
An encryption scheme IT = (K,€,D) is deterministic if E is deterministic. 


SEMANTIC SECURITY OF DE. We recall the semantic-security style PRIV notion 
for DE from [3]. (More specifically, it is a “comparison-based” semantic-security 
style notion; this was shown equivalent to a “simulation-based” formulation 
in [5].) To encryption scheme IT = (K,€,D), an adversary A = (Ao, A1, A2), 
and k € N we associate the left-most and middle experiments in Figure[I] We 
require that there are functions v = vu(k),@ = ¢(k) such that (1) |x| = v, (2) 
|x[2]| = £ for all 1 < i < v, and (3) the x[i] are all distinct with probability 1 
over (x,t) < Aı (state) for any state output by Ao. (Since in this work we only 
consider the definition relative to deterministic J requirement (3) is without loss 
of generality.) In particular we say A outputs vectors of size v for v as above. 
Define the PRIV advantage of A against IT as 


Advis (k) = Pr | Expy (E) > 1 | - Pr | Expy °(k) > 1 


Let M be a class of distributions on message vectors. Define Aym to be the class 
of adversaries {A = (Ao, A1, A2)} such that for each A € Ay there isa M € M 
for which x has distribution M over (x,t) È A; (state) for any state output by 
Ao. We say that IT is PRIV secure for M if Adv? (-) is negligible for any PPT 
A € Ay. Note that (allowing non-uniform adversaries as usual) we can without 
loss of generality consider only those A with “empty” Ag, since A; can always 
be hardwired with the “best” state. However, following [5] we explicitly allow 
state because it greatly facilitates some proofs. 


INDISTINGUISHABILITY OF DE. Next we recall the indistinguishability-based 
formulation of security for DE [5J7]. To an encryption scheme IT = (K,€,D), 
an adversary D = (Dı, D2), and k € N we associate the right-most experi- 
ment in Figure[I] We make the analogous requirements on Dı as on A, in the 
PRIV definition. Define the IND advantage of D against IT as Advij‘p(k) = 
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2. Pr | Expii'p() =>1 | — 1. Let M* be a class of pairs of distributions on mes- 


sage vectors. Define Dm» to be the class of adversaries {D = (D1, D2)} such that 
for each D € Dye, there is a pair of distributions (Mo, M1) € M* such that 
for each b € {0,1} the distribution of x È Dı (b) is My. We say that I is IND 
secure for M* if Advjj‘p(-) is negligible for any PPT D € Dyr. 


Expr Expy (k): Expr Exp? y (k): Expr Expy") (k): 

(pk, sk) È K(1*) (pk, sk) È K(1") (pk, sk) È K(1*) 

state © Ao(1") state < Ao(1") bÈ {0,1}; (x,t) È Di(b) 
(x1,t1) Č Ai(state) — |(x1, t1), (ko, to) © A1(state)|e & €(pk, x) 

c  E(pk, x1) c  E(pk, xo) dÈ D2(pk,c) 

g + A2(pk, c, state) g Č A2(pk,c, state) If b = d ret 1 else ret 0 


If g = tı ret 1 else ret OjIf g = tı ret 1 else ret 0 


Fig. 1. Security experiments for deterministic encryption 


3 Our Tools 


3.1 A Precise Definitional Equivalence for DE 


While the PRIV definition is meaningful with respect a single message distri- 
bution M, the IND definition must inherently talk of pairs of different message 
distributions. Thus, in proving an equivalence between the two notions, the best 
we can hope to show is that PRIV security for a message distribution M is 
equivalent to IND security for some class of pairs of message distributions (de- 
pending on M). However, prior works did not provide such a statement. 
Instead, they showed that PRIV security on all distributions of a given entropy p 
is equivalent to IND security on all pairs of distributions of slightly less entropy. 


INDUCED DISTRIBUTIONS. To state our result we first give some definitions relat- 
ing to a notion of “induced distributions.” Let X, X’ be distributions (or random 
variables) on the same domain. For a € N, we say that X’ is an a-induced dis- 
tribution of X if X’ is a conditional distribution X’ = X | E for an event E such 
that Pr | E] > 27%. We call E the corresponding event to X’. We require that the 
pair (X, E) is efficiently samplable. Define X[a] to be the class of all a-induced 
distributions of X. Furthermore, let Xo, X 1 be two a-induced distributions of 
X with corresponding events Eo, E, respectively. Define X*/a] = {(Xo, X1)} to 
be the class of all pairs (Xo, X1) for which there is a pair (Xj, X{) of a-induced 
distributions of X such that Xo (resp. X1) is statistically close to X{% (resp. x!) E 


THE EQUIVALENCE. We are now ready to state our result. The following theorem 
captures the “useful” direction that IND implies PRIV: 


5 We need to allow a negligible statistical distance for technical reasons. Since we 
will be interested in indistinguishability of functions of these distributions this will 
not make any appreciable difference, and hence we mostly ignore this issue in the 
remainder of the paper. 
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Theorem 1. Let IT = (K,E,D) be an encryption scheme. For any distribution 
M on message vectors, PRIV security of II with respect to M is implied by 
IND security of II with respect to M* [2]. In particular, let A € Am be a PRIV 
adversary against II. Then there is a IND adversary D € Dyg«{g} such that for 
allk EN 


k 
Adv} y(k) < 162- Advii'n(k) + (3) 
Furthermore, the running-time of D is the time for at most that for k executions 
of A (but 4 in expectation). 


The theorem essentially follows from the techniques of [5]; details are given 
in [20|. Thus, our contribution here is not in providing any new technical tools 
used in proving this result but rather in extracting it from the techniques of [5]. 
In particular, our more precise statement allows us to use results about en- 
tropy of conditional distributions, which we explain next. Looking ahead, it also 
simplifies proofs for schemes based on one-wayness, because it is easy to argue 
that one-wayness is preserved on slightly induced distributions (the alternative 
would require an argument that distributions of lower entropy are induced by 
distributions of higher entropy). 

To establish a definitional equivalence; that is, also show that PRIV implies 
IND, we need to further restrict the latter to pairs (that are statistically close 
to pairs) of complementary 2-induced distributions of M (which we did not do 
above for conceptual simplicity), where we call Xo, Xı complementary if E, = Eo. 
We stress that this further restriction is not needed for the “useful” implication 
above and for our security proofs. 


3.2 Measuring Computational Entropy of Induced Distributions 


We study how conditioning a distribution reduces its computational entropy. 
This result is used later in the work to show that randomness extractors can 
convert a hardcore function into a robust one; it also applicable to leakage- 
resilient cryptography. This result is simplest to understand when stated in 
terms of Metric* computational entropy (defined in [I8]) It is easy to see that 
conditioning on an event E with probability Pe reduces (information-theoretic) 
min-entropy by at most log Pe. We show that the same holds for the computa- 
tional notion of Metric* entropy if one considers reduction in both quantity and 
quality: 


Lemma 1. Let X,Y be discrete random variables. Then 


HSS lv = y) > H" (X) — log 1/Py (y) where s ~ s. 


The use of Metric* entropy and an improved proof allow for a simpler and 
tighter formulation than results of [18] Lemma 3] and [86] Theorem 1.3] (see the 
full version for a comparison [20]). The proof is similar to [36] and can be found 
in the full version [20]. 
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If we now consider averaging over all values of Y, we obtain the following 
simple formulation that expresses how much average entropy is left in X from 
the point of view of someone who knows Y. (This scenario naturally occurs in 
leakage-resilient cryptography, as exemplified in [18}). 


Theorem 2. Let X,Y be discrete random variables. Then 
HIY (X|Y) > HEF (X) -—log|Y|, where s' = s. 


This statement is similar to the statement in the information-theoretic case 
(where the reduction is only in quantity) from [I5] Lemma 2.2]. In the full 
version [20], we compare the theorem to [IT] Lemma 16] and [22] Lemma 3.1]. 

To apply a randomness extractor, we must convert conditional Metric* to 
conditional HILL entropy using [21] Theorem 2.7], this conversion loses some 
quality. Thus, the conversion should be applied only when necessary (for in- 
stance, repeated conditioning is best measured in Metric* entropy, and then 
converted to HILL entropy once at the end). Here we provide a “HILL-to-HILL” 
formulation of Lemma[I] 


Corollary 1. Let X be a discrete random variable over x and let Y be a discrete 
random variable. Then, 


Hes (X|¥ = y) > Hey" (X) — log 1/Py (y) 


where e€ =€/Py(y)+% ros )xl and s = 2(x/s/ log |x|). 


The Corollary follows by combining Lemma [| Theorem 5.2], and setting 
€HILL = */log|x|/s (see the full version for justification of parameters [20]). 


3.3 A (Crooked) Leftover Hash Lemma for Correlated Distributions 


The following generalization of the (Crooked) LHL to correlated input distribu- 
tions will be very useful to us when considering bounded multi-message security 
in Section [6] Since our generalization of the classical LHL is a special case of our 
generalization of the Crooked LHL, we just state the latter here. 


Lemma 2. (CLHL for Correlated Sources) Let H: K x D > R be a 2t- 
wise d-dependent function for t > 0 with range R, and let f : R > S bea 
function. Let X = (Xy,...,X+) where the X; are random variables over D such 
that Hoo(Xi) > u for all 1 < i < n and moreover Pr| X; = X;]| = 0 for all 
1<iAj<t. Then 


A((K, f(H(K, X))), (K, f(0U))) < Ev [S|*(22-# + 36) 


where K K and U = (U1,...,U%) where the U; are all uniform and indepen- 
dent over R (recall that functions operate on vectors X and U component-wise). 
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One can further extend Lemma B]to the case of average conditional min-entropy 
using the techniques of [I5]. Note that the lemma implies the corresponding 
generalization of the classical LHL by taking H to have range S and f to be 
the identity function. The proof of the lemma, which extends the proof of the 
Crooked LHL in [7], is given in the full version [20]. 


4 Encrypt-with-Hardcore Scheme from Robust HCFs 


We define a new notion of robust HCFs. Intuitively, robust HCFs are those that 
remain hardcore when the input is conditioned on any event that occurs with 
good probability. 


Definition 2. Let F be a TDF generator and let hc be a HCF such that hc is 
hardcore for F with respect to a distribution X on input vectors. For a = a(k), 
we say hc is a-robust for F on X if he is also hardcore for F with respect to the 
class Xa] of a-induced distributions of X. 


DISCUSSION. Robustness is interesting even for the classical definition of hard- 
core bits, where hc is boolean and a single uniform input x is generated in the 
security experiment. Here robustness means that hc remains hardcore even when 
x is conditioned on an event that occurs with good probability. It is clear that 
not every hardcore bit in the classical sense is robust — note, for example, that 
while every bit of the input to RSA is well-known to be hardcore assuming RSA 
is one-way [I], they are not even 1-robust since we may condition on a particular 
bit of the input being a fixed value. 


THE SCHEME. Let IT = (K,€,D) be a probabilistic encryption scheme, F be 
a TDF generator, and hc be a HCF. Assume that hc outputs binary strings 
of the same length as the random string r needed by €. Define the associated 
“Encerypt-with-Hardcore” deterministic encryption scheme EwHCore[//, F, hc] = 
(DK, DE,DD) with plaintext-space PtSp = {0,1}* via 


Alg DK(1*): Alg DE((pk, f),x):|Alg DD((sk, f-+),c): 
(pk, sk) È K(1*) r 4 hes(2) y + D(sk,c) 

(F, f1) & F(a) ct E(pk, f(x);r) Je Fy) 

Return ((pk, f), (sk, f~1))|Return c Return x 


SECURITY ANALYSIS. To gain some intuition, suppose hc is hardcore for F on 
some distribution X on input vectors. One might think that PRIV security of 
EwHCore = EwHCore| lI, F, hc] on X then follows by IND-CPA security of I. 
However, this is not true. For example, hc may be a “natural” hardcore function 
(i.e., that outputs some bits of the input), and E may output some of its coins 
in the clear. This is how our notion of robustness comes into play, giving us the 
following theorem (for a proof and further discussion, see [20]): 


Theorem 3. Suppose IT is IND-CPA secure, hc is 2-robust for F on a distri- 
bution M on input vectors. Then EwHCore[II, F, hc] is PRIV-secure on M. 
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5 Single-Message Instantiations of EwHCore 
5.1 Getting Robust Hardcore Functions 


AUGMENTED TRAPDOOR FUNCTIONS. In order to describe the conversion pro- 
cedure, it is useful to introduce the notion of an “augmented” version of a TDF, 
which augments the description of the TDF with keying material for a HCF. 
More formally, let F be a trapdoor function generator and let H be a keyed 
function with keyspace K. Define the H-augmented version of F, denoted F[H], 
that on input 1* returns (f,K),(f~!, K) where (f, f-!) € F(1*) and K € K; 
evaluation is defined for x € {0,1}* as f(z) (i-e., evaluation just ignores K) and 
inversion is defined analogously. 


MAKING ANY LARGE HARDCORE FUNCTION ROBUST. We show that by applying 
a randomness extractor in a natural way, one can convert any large hardcore 
function in the standard sense to one that is robust (with some loss in parame- 
ters). However, while the conversion procedure is natural, proving that it works 
turns out to be non-trivial. 

Let F be a TDF generator, and let hc: {0,1}* > {0,1}* be an HCF for F 
on an input distribution X such that H,.(X) > u. Let ext : {0,1} x {0,1}4 > 
{0,1}™ x {0, 1}¢ be a strong average-case (¢— a, €ext)-extractor for a € N. (Here 
we view ext as a keyed function with the second argument as the key.) Define 
anew “extractor-augmented” HCF hclext] for F[ext] such that hclext],(2) = 
ext(hc(x), s) for all x € {0,1}* and s € {0,1}%. The following characterizes the 
a-robustness of hc[ext]. 


Lemma 3. Fix X’ € X[a], and suppose there is a distinguisher D’ against 
hc[ext] on X’. Then there is a distinguisher D against hc on X such that for 
allk EN 


Adve ncjext], p (k) <O (y Adv x ne, p(k) + 2° Advin p(k)) + Eext - 


Furthermore, the running-time of D is O((tp:(k+£))?), where tp: is the running- 
time of D. 


Note that when a = log(k) the security loss in the reduction is polynomial (in 
our application we just need a = 2). The proof, which appears in the full version 
[20], relies crucially on Corollary HJ 

The above conversion procedure notwithstanding, we give specific examples 
of hardcore functions that are already robust. 


ROBUST GOLDREICH-LEVIN BITS FOR ANY TDF. In [20] we show that the 
Goldreich-Levin [23] (GL) hardcore function is robust. Specifically, if the function 
that extracts i-many independent GL bits is hardcore for F, then it is also 
O(log k)-robust for F. 


ROBUST BITS FOR ANY LTDF. Peikert and Waters [35] showed that LTDFs 
admit a simple, large hardcore function, namely a pairwise-independent hash 
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function (the same argument applies also to universal hash functions or, more 
generally, randomness extractors). By using average conditional min-entropy, 
in we show that this hardcore function is O(log k) robust. 


5.2 Putting It Together 


Equipped with the above results, we describe instantiations of the Encrypt-with- 
Hardcore scheme that both explain prior constructions and produce novel ones. 


USING AN ITERATED TRAPDOOR PERMUTATION. The prior trapdoor permuta- 
tion based DE scheme of Bellare et al. readily provides an instantiation of 
EwHCore by using an iterated trapdoor permutation as the TDF. Let F be a TDP 
and hc be a hardcore bit for F. For i € N denote by F’ the TDP that iterates F 
i-many times. Define the Blum-Micali-Yao (BMY) hardcore function for 
Fi via BMY’ [he](f, x) = he(x)||he(f(a))||... |]he(f’-!). Bellare et al. [5] used 
the specific choice of hc = GL (the Goldreich-Levin bit) in their scheme, which is 
explained by the fact that the GL bit is robust, and one can show that BMY it- 
eration expands one robust hardcore bit to many (on a non-uniform distribution, 
the bit should be hardcore on all “permutation distributions” of the former). 
However, due to our augmentation procedure to make any large hardcore 
function robust, we are no longer bound to any specific choice of hc. For example, 
we may choose hc to be a natural hardcore bit. In fact, it may often be the 
case that F has many simultaneously hardcore natural bits, and therefore our 
construction will require fewer iterations of the TDP than the construction of [5]. 


UsINnG A Lossy TDF. Using the fact that extractors are robust hardcore func- 
tions for LTDFs, we get an instantiation of the Encrypt-with-Hardcore scheme 
from LTDFs that is an alternative to the prior scheme of Boldyreva et al. [7] and 
the concurrent work of Wee [40]. Our scheme requires an LTDF with residual 
leakage s < Hx(X)-—2log(1/e)—1r, where r is the number of random bits needed 
in € (or the length of a seed to a pseudorandom generator that can be used to 
obtain those bits). 


USING 2-CORRELATED PRODUCT TDFs. Hemenway et al. show a construc- 
tion of DE from a decisional 2-correlated product TDF, namely where F has the 
property that f(x), fo(x) is indistinguishable from f)(x1), f2(£2) where 21, 22 
are sampled independently (in both cases for two independent public instances 
fi, f2 of F). They show such a trapdoor function is a secure DE scheme for 
uniform messages. To obtain an instantiation of EwHCore under the same as- 
sumption, we can use F as the TDF, and an independent instance of the TDF 
as hc. When a randomness extractor is applied to the latter, robustness follows 
from Lemma [B] taking into account that the lemma holds even if the output of 
the hardcore function is not uniform, as long as it has high HILL entropy. 


UsinGc ANY TDF WITH A LARGE HCF. Our most novel instantiations in the 
single-message case come from considering TDFs that have a sufficiently large 
HCF but are not necessarily lossy or an iterated TDP. Let us first consider in- 
stantiations on the uniform message distribution Freeman et al. [I9] shown that 
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the Niederreiter TDF [32] has linearly many (simultaneous) hardcore bits un- 
der the “Syndrome Decoding Assumption (SDA)” and “Indistinguishability As- 
sumption (IA)” (as defined in Section 7.2]). Furthermore, the RSA and 
Paillier [34] TDPs have linearly many hardcore bits under certain computational 
assumptions, namely the “Small Solutions RSA (SS-RSA) Assumption” [39] and 
the “Bounded Computational Composite Residuosity (BCCR) Assumption” [9] 
respectively. Because these hardcore functions are sufficiently long, they can be 
made robust via Lemma Bland give us a linear number of robust hardcore bits— 
enough to use as randomness for € (expanded by a pseudorandom generator if 
necessary). Thus, by Theorem [B] we obtain: 


Corollary 2. Under SDA+IA for the Niederreiter TDF, DE for the uniform 
message distribution exists. Similarly, under SS-RSA the RSA TDP or BCCR 
for the Paillier TDP respectively, DE for the uniform message distribution exists. 


In particular, the first statement provides the first DE scheme without random 
oracles based on the hardness of syndrome decoding. (A scheme in the random 
oracle model follows from [3].) Moreover, the schemes provided by the second 
statement are nearly as efficient as the ones obtained from lossy TDF's (since they 
do not use iteration), and the latter typically requires decisional assumptions (in 
contrast to the computational assumptions used here). 

If we do not wish to rely on specific assumptions, we can also get DE from 
strong but general assumptions, such as sub-exponential hardness. We can also 
obtain DE for nonuniform message distributions (the strength of the assumption 
needed will depend on how far the entropy of the message space is from the 
maximum). See [20] for details. 


6 Bounded Multi-message Security and its Instantiations 


6.1 The New Notion and Variations 


THE NEW NOTION. Our notion of g-bounded multi-message security (or just q- 
bounded security) for DE is quite natural, and can be viewed as analogous to 
other forms of “bounded” security (see e.g. [I2]). In a nutshell, it asks for security 
on up to q arbitrarily correlated but high-entropy messages (where we allow 
the public-key size to depend on q). Fix an encryption scheme H = (K,€,D). 
For q = q(k) and u = p(k), let M% be the class of distributions on message 
vectors M4 = (Mi",..., M69) where Hoo (Mj"") > p and for alll <i < q 
and My; ..., Mf, are distinct with probability 1. We say that I is q-bounded 
multi-message PRIV (resp. IND) secure for -sources if it is PRIV (resp. IND) 
secure for M%". By Theorem [I] PRIV on M®# is equivalent to IND on M?#-?, 


UNBOUNDED MULTI-MESSAGE SECURITY FOR g-BLOCK SOURCES. We also con- 
sider unbounded multi-message security for what we call a q-block source, a 
generalization of a block-source [I0] where every q-th message introduces some 
“fresh” entropy. Fix an encryption scheme IT = (K,€,D). For q = q(k), n = 
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n(k), and u = p(k), let M2?” be the class of distributions on message vectors 
Mone = (ME ,..., MeO") such that Hæ(Xgitj | Xi = Big. , Xqi-1 = 
Lqi-1) > y for all 1 < i < n, all 0 < j < q— 1, and all outcomes z1,..., £qi—1 
of X1, .-., Xqi—1. We say that I is q-bounded multi-message PRIV (resp. IND) 
secure for (u, n)-block-sources if IT is PRIV (resp. IND) secure on M4“. Using 
a similar argument to [7] Theorem 4.2], one can show equivalence of PRIV on 
Ma” to IND on Mt”, 


6.2 Our Basic Scheme 


We cannot trivially achieve q-bounded security by running, say, q copies of a 
scheme secure for one message in parallel (and encrypting the i-th message under 
the i-th public key), since this approach would lead to a stateful scheme. The 
main technical tool we use to achieve the notion is Lemma B} Combined with [I5] 
Lemma 2.2], this tells us that a 2q-wise independent hash function is robust on 
correlated input distributions of sufficient min-entropy: 


Proposition 1. For any q, let LTDF = (F, F') be an LTDF generator with 
input length n and residual leakage s, and let H: K x D —> R where r = log |R] 
be a 2q-wise independent hash function. Then H is a 2-robust hardcore function 
for F on any input distribution X = (X1,...,Xq) such that H(X) > q(s + 
r) + 2logq + 2log(1/e) — 2 for negligible €. 


By Theorem [3] we obtain a g-bounded DE scheme based on lossy trapdoor func- 
tions that lose a 1—O(1/gq) fraction of its input. Specifically, we can use the DDH- 
based construction of Peikert and Waters [35], the Paillier-based one of [7[19}, 
or the one from d-linear of [I9] for any polynomial q. 


6.3 Our Optimized Scheme 


We show that by extending some ideas of [7], we obtain a more efficient DE 
scheme meeting g-bounded security that achieves better parameters. 


INTUITION AND PRELIMINARIES. Intuitively, for the optimized scheme we mod- 
ifying the scheme of [7] to first pre-process an input message using a 2q-wise 
independent permutation (instead of pairwise as in [7]). However, there are two 
issues to deal with here. First, for q > 1 such a permutation is not known to 
exist (in an explicit and efficiently computable sense). Second, Lemma [2] applies 
to t-wise independent functions rather than permutations. 

To solve the first problem, we turn to 2q-wise “d-dependent” permutations 
(as constructed in e.g. [29]). Namely, say that a permutation H: K x D > D is 
t-wise 6-dependent if for all distinct 11,...,24 € D 


A((H(K,21),...,H(K,x:)),(Pi,...,Pr)) < 6, 


where K & K and P,,...,P are defined iteratively by taking Pı to be uniform 
on D and, for all 2 < i < t, taking P; to be uniform on R \ {p1,...,p;-1} where 
Pi,---,pi-1 are the outcomes of P,,..., P;-1 respectively. 
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To solve the second problem, we show that a t-wise 6-dependent permutation 
is a t-wise 6’-dependent function where 6’ is a bit bigger than ô (see [20] for 
details, where we also restate Lemma B]in terms of 6-dependent permutations). 


THE CONSTRUCTION. We now detail our construction. Let LTDF = (F, F') be 
an LTDF and let P: K x {0,1}* — {0,1}* be an efficiently invertible family of 
permutations on k bits. Define the associated deterministic encryption scheme 
IT|LTDF, P] = (DK, DE, DD) with plaintext-space PtSp = {0,1}* via 


Alg DK(1*): Alg DE((f, K), x):|Alg DD((sk, f~'),c): 
(F, F) E FIN); KEK eH f(P(K,2)) fe JPK, 0)) 
Return ((f, K), (f1, K)) {Return c Return z 


We have the following result: 


Theorem 4. Suppose LTDF is a lossy trapdoor function on {0,1}” with residual 
leakage s, and let q,€ > 0. Suppose P is a 2q-wise -dependent permutation on 
{0,1}" for 6 = t?/2". Then for any q-message IND adversary B € Dyan with 
min-entropy u > qs + 2logq + log(1/e) + 5, there is an LTDF distinguisher D 
such that for all k € N 


ind ltdf 
AdviiiLror, p], g (k) < Advitpr,p(k) +€. 
Furthermore, the running-time of D is the time to run B. 


An efficiently invertible 2q-wise -dependent permutation on {0,1}” for ô = 
t?/2” can be obtained from [29] using key length nt + log(1/6) = n(t + 1) — 2t. 
Comparing the above to Proposition [| we see that we have dropped the r in 
the entropy bound (indeed, there is no hardcore function here). 
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Abstract. We consider pseudorandom generators in which each output 
bit depends on a constant number of input bits. Such generators have 
appealingly simple structure: they can be described by a sparse input- 
output dependency graph G and a small predicate P that is applied at 
each output. Following the works of Cryan and Miltersen (MFCS ’01) 
and by Mossel et al (FOCS ’03), we ask: which graphs and predicates 
yield “small-bias” generators (that fool linear distinguishers)? 

We identify an explicit class of degenerate predicates and prove the 
following. For most graphs, all non-degenerate predicates yield small-bias 
generators, f: {0,1}" — {0,1}"", with output length m = n'** for some 
constant € > 0. Conversely, we show that for most graphs, degenerate 
predicates are not secure against linear distinguishers, even when the 
output length is linear m = n + 2(n). Taken together, these results 
expose a dichotomy: every predicate is either very hard or very easy, in 
the sense that it either yields a small-bias generator for almost all graphs 
or fails to do so for almost all graphs. 

As a secondary contribution, we give evidence in support of the view 
that small bias is a good measure of pseudorandomness for local func- 
tions with large stretch. We do so by demonstrating that resilience to 
linear distinguishers implies resilience to a larger class of attacks for such 
functions. 


Keywords: small-bias generator, dichotomy, local functions, NCO. 


1 Introduction 


In recent years there has been interest in the study of cryptographic primitives 
that are implemented by local functions, that is functions in which each output 
bit depends on a constant number of input bits. This study has been in large 
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part spurred by the discovery that, under widely accepted cryptographic as- 
sumptions, local functions can achieve rich forms of cryptographic functionality, 
ranging from one-wayness and pseudorandom generation to semantic security 
and existential unforgeability [6]. 

Local functions have simple structure: they can be described by a sparse 
input-output dependency graph and a sequence of small predicates applied at 
each output. Besides allowing efficient parallel evaluation, this simple structure 
makes local functions amenable to analysis, and gives hope for understanding 
their computational properties. Given that the cryptographic functionalities that 
local functions can achieve are quite complex, it is very interesting and appeal- 
ing to try to understand which properties of local functions (namely, graphs and 
predicates) are necessary and sufficient for them to implement such functionali- 
ties. 

In this work we focus on the study of local pseudorandom generators with large 
stretch. We give evidence that for most graphs, all but a handful of “degenerate” 
predicate types yield pseudorandom generators with output length m = n!** for 
some constant £ > 0. Conversely, we show that for almost all graphs, degenerate 
predicates are not secure even against linear distinguishers. Taken together, these 
results expose a dichotomy: every predicate is either very hard or very easy, in 
the sense that it either yields a small-bias generator for almost all graphs or fails 
to do so for almost all graphs. 


1.1 Easy, Sometimes Hard, and Almost Always Hard Predicates 


Recall that a pseudorandom generator is a length-increasing function f : {0,1}" > 
{0,1}™ such that no efficiently computable test can distinguish with noticeable 
advantage between the value f(a) and a randomly chosen y € {0,1}, when 
x € {0,1}”" is chosen at random. The additive stretch of f is defined to be the 
difference between its output length m and its input length n. 

In the context of constructing local pseudorandom generators of superlinear 
stretch, we may assume without loss of generality that all outputs apply the same 
predicate P: {0,1}4¢ > {0, ni We are interested in understanding which d-local 
functions fg,p: {0,1}" > {0,1}”, described by a graph G and a predicate P, 
are pseudorandom generators. For a predicate P, we will say 


— P is easy if fa,p is not pseudorandom for every G (against a given class of 
adversaries), 

— P is sometimes hard if fa,p is pseudorandom for some G, and 

— P is almost always hard if fa,p is pseudorandom for a 1 — o(1) fraction of 
graphs cE] 


1 Tf this is not the case, project on the outputs labeled by the most frequent predicate. 
This decreases the stretch only by a constant factor as there are only 22° different 
predicates. 

2 One cannot hope for always hard predicates, for which fg,p is pseudorandom for 
all graphs, as there are simple examples of “easy” graphs G for which fa,p fails to 
be pseudorandom regardless of P. 
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Cryan and Miltersen [I7] and Mossel et al. [27] identified several classes of pred- 
icates that are easy for polynomial time algorithms when the stretch is a suffi- 
ciently large linear function. These include four types of predicates: 


linear predicates, i.e., P(w) = b + Xw; (mod 2) where b € {0,1}, 
unbalanced predicates, i.e., Pry,[P(w) = 1] 4 4, 

predicates that are biased towards one input, i.e., Pr,[P(w) = wi] 4 4, 
predicates that are biased towards a pair of inputs, i.e., Pr,,[P(w) = wit wj 
(mod 2)] Æ 4. 


igo E 


We call such predicates degenerate. It turns out that all predicates of locality at 
most 4 are degenerate. 

On the positive side, Mossel et al. [27] also gave examples of 5-bit predicates 
that are sometimes (exponentially) hard against linear distinguishers. Apple- 
baum et al. [5] show that when the locality is sufficiently large, almost always 
hard predicates against linear distinguishers exist. 

Pseudorandomness against linear distinguishers means that there is no subset 
of output bits whose XOR has noticeable bias. This notion, due to Naor and 
Naor [28], was advocated in the context of local pseudorandom generators by 
Cryan and Miltersen [I7]. A bit more formally, for a function f : {0,1}" > 
{0,1}, we let 


bias( f) = max |Pr[L(f(Un)) = 1] — PrE (Um) = 1], 


where the maximum is taken over all affine functions L : F} > F2. A small-bias 
generator is a function f for which bias(f) is small (preferrably negligible) as a 
function of n. 


1.2 Our Results 


We fully classify predicates by showing that all predicates that are not known 
to be easy, are almost always hard. 


Theorem 1 (Non-degenerate predicates are hard). Let P : {0,1}4 > 
{0,1} be any non-degenerate predicate. Then, for every € < 1/4 andm=n!"**; 


Pr[bias(fo,p) < 6(n)] > 1 — o(1), 


where 6(n) = exp(—Q(n!/4-*)) and G is randomly chosen from all d-regular hy- 
pergraphs with n nodes (representing the inputs) and m hyperedges (representing 
the outputs). 


The theorem shows that, even when locality is large, the only easy predicates 
are degenerate ones, and there are no other “sources of easiness” other than ones 
that already appear in predicates of locality 4 or less. 

Conversely, we show that degenerate predicates are easy for linear distinguish- 
ers (as opposed to general polynomial-time distinguishers). 


A Dichotomy for Local Small-Bias Generators 603 


Theorem 2 (Linear tests break degenerate predicates). For every m = 
n+ N(n), and every degenerate predicate P : {0,1}¢ — {0,1} 


Prfbias(fc,p) > 2(1/log(n))] > 1 — o(1), 


where G is randomly chosen from all d-regular hypergraphs with n nodes and m 
hyperedges. 


The proof of Theorem [2] mainly deals with degenerate predicates that are cor- 
related with a pair of their inputs; In this case, we show that the non-linear 
distinguisher which was previously used in and was based on a semi-definite 
program for MAX-2-LIN [21] can be replaced with a simple linear distinguisher. 
(The proof for other degenerate predicates follows from previous works). 

Taken together, Theorems [I] and 2] expose a dichotomy: a predicate can be 
either easy (fail for almost all graphs) or hard (succeeds for almost all graphs). 
One possible interpretation of our results is that, from a designer point of view, 
a strong emphasis should be put on the choice of the predicate, while the choice 
of the input-output dependency graph may be less crucial (since if the predicate 
is appropriately chosen then most graphs yield a small-bias generator). In some 
sense, this means that constructions of local pseudorandom generators with large 
stretch are robust: as long as the graph G is “typical,” any non-degenerate 
predicate can be used (our proof classifies explicitly what is a typical family of 
graphs and in addition shows that even a mixture of different non-degenerate 
predicates would work). 


1.3. Why Polynomial Stretch? 


While Applebaum et al. [6] give strong evidence that local pseudorandom gen- 
erators exist, the stretch their construction achieves is only sublinear, that is 
m=n-+n'~©. (This stretch can be achieved even for 4-local predicates which 
are necessarily degenerate.) In contrast, the regime of large (polynomial or even 
linear) stretch is not as well understood, and the only known constructions are 
based on non-standard assumptions. (See Section [I.5]) 

Local generators of large stretch have several applications in cryptography and 
complexity, such as secure computation with constant overhead [24] and strong 
(average-case) inapproximability results for constraint-satisfaction problems [7]. 
These results are not known to follow from other (natural) assumptions. It should 
be mentioned that it is possible to convert small polynomial stretch of m = 
n'*© into arbitrary (fixed) polynomial stretch of m = n° at the expense of 
constant blow-up in the locality. (This follows from standard techniques, see [4] 
for details). Hence, it suffices to focus on the case of m = n'** for some fixed e. 

The proof of Theorem [I] yields exponentially small bias when m = O(n), and 
sub-exponential bias for m = n!+® where e < 1/4. We do not know whether this 
is tight, but it can be shown that some non-degenerate predicates become easy 
(to break on a random graph) when the output length is m = n? or even m = 
n®/?. In general, it seems that when m grows the number of hard predicates of 
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locality d decreases, till the point m* where all predicates become easy. (By [27], 
m* < n%/?.) It will be interesting to obtain a classification for larger output 
lengths, and to find out whether a similar dichotomy happens there as well. 


1.4 Why Small-Bias? 


Small-bias generators are a strict relaxation of cryptographic pseudorandom gen- 
erators in that the tests L : F} — Fə are restricted to be affine (as opposed to 
arbitrary efficiently computable functions). Even though affine functions are, 
in general, fairly weak distinguishers, handling them is a necessary first step to- 
wards achieving cryptographic pseudorandomness. In particular, affine functions 
are used extensively in cryptanalysis and security against them already rules out 
an extensive class of attacks. 

For local pseudorandom generators with linear stretch, Cryan and Miltersen 
conjectured that affine distinguishers are as powerful as polynomial-time distin- 
guishers [I7]. In Section [5] we attempt to support this view by showing that 
resilience against small-bias, by itself, leads to robustness against other classes 
of attacks. 

Small-bias generators are also motivated by their own right being used as build- 
ing blocks in constructions that give stronger forms of pseudorandomness. This in- 
cludes constructions of local cryptographic pseudorandom generators [74], as well 
as pseudorandom generators that fool low-degree polynomials [I4], small-space 
computations [23], and read-once formulas[1]]. 


1.5 Related Work 


The function fg,p was introduced by Goldreich [22] who conjectured that when 
m = n, one-wayness should hold for a random graph and a random predicate. 
This view is supported by the results of [2229362612025] who show that a 
large class of algorithms (including ones that capture DPLL-based heuristics) 
fail to invert fg,p in polynomial-time. 

At the linear regime, i.e., when m = n + §2(n), it is shown in [12] that if the 
predicate is degenerate the function fg,p can be inverted in polynomial-time. 
(This strengthens the results of who only give distinguishers.) Recently, 
a strong self-amplification theorem was proved in [13] showing that for m = 
n+ 2a(n) if fa, p is hard-to-invert over tiny (sub-exponential small) fraction 
of the inputs with respect to sub-exponential time algorithm, then the same 
function is actually hard-to-invert over almost all inputs (with respect to sub- 
exponential time algorithms). 

Pseudorandom generators with sub-linear stretch can be implemented by 4- 
local functions based on standard intractability assumptions (e.g., hardness of 
factoring, discrete-log, or lattice problems) {6], or even by 3-local functions based 
on the intractability of decoding random linear codes [8]. However, it is unknown 
how to extend this result to polynomial or even linear stretch since all known 
stretch amplification procedures introduce a large (polynomial) overhead in the 
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locality. In fact, for the special case of 4-local functions (in which each out- 
put depends on at most 4 input bits), there is a provable separation: Although 
such functions can compute sub-linear pseudorandom generators [6] they cannot 
achieve polynomial-stretch [1727]. 

Alekhnovich [I] conjectured that for m = n + O(n), the function fa,p is 
pseudorandom for a random graph and when P is a randomized predicate which 
computes 21 ® zg ® z3 and with some small probability p < 4 flips the result. 
Although this construction does not lead directly to a local function (due to the 
use of noise), it was shown in [7] that it can be derandomized and transformed 
into a local construction with linear stretch. (The restriction to linear stretch 
holds even if one strengthen Alekhnovich’s assumption to m = poly(n).) 

More recently, [4] showed that the pseudorandomness of fg,p with respect 
to a random graph and output length m, can be reduced to the one-wayness of 
fu.p with respect to a random graph H and related output length m’ (for certain 
settings of the stretch and security parameters). The current paper complements 
this result as it provides a criteria for choosing the predicate P. 


2 Techniques and Ideas 


In this section we give an overview of the proof of Theorem [I] Let f : {0,1}" > 
{0, 1} be a d-local function where each output bit is computed by applying some 
d-local predicate P : {0,1}4 + {0,1} to a (ordered) subset of the inputs S C [n]. 
Any such function can be described by a list of m d-tuples G = (S},..., Sm) and 
the predicate P. Under this convention, we let fa,p : {0,1}" > {0,1} denote 
the corresponding d-local function. 

We view G as a d-regular hypergraph with n nodes (representing inputs) and 
m hyperedges (representing outputs) each of size d. (We refer to such a graph 
as an (m,n, d)-graph.) Since we are mostly interested in polynomial stretch we 
think of m as n'*® for some fixed € > 0, e.g., € = 0.1. 

We would like to show that for almost all (m, n, d)-graphs G, the function fg, p 
fools all linear tests L, where P is non-degenerate. Following [27], we distinguish 
between light linear tests which depend on less than k = Q(n1~?*) outputs, and 
heavy tests which depend on more than k outputs. 

From our definition of non-degenerate predicates, it immediately follows that 
such predicates P satisfy two forms of “non-linearity”: (1) (2-resilience) P is 
uncorrelated with any linear function in two or fewer inputs; and (2) (algebraic 
nonlinearity) P is not linear as a polynomial over F2. Both properties are classical 
design criteria which are widely used in practical cryptanalysis (cf. [30]). We use 
the fist property to fool light linear tests (tests that depend on a small number 
of outputs) and the second one to fool heavy linear tests (tests that depend on 
a large number of outputs). 


2.1 Fooling Light Tests 


Our starting point is a result of [27] which shows that if the predicate is the 
parity predicate ® and the graph is a good expander, the output of fa, (Un) 
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perfectly fools all light linear tests. In terms of expectation, this can be written 
as 


E[L(fe,0(z)) = 0), 


where we think of {0,1} as {+1}, and let L : {41} — {+1} be a light linear 
test. Our key insight is that the case of a general predicate P can be reduced to 
the case of linear predicates. 

More precisely, let € denote the outcome of the test L(fg,p(x)). Then, by 
looking at the Fourier expansion of the predicate P, we can write € as a con- 
vex combination over the reals of exponentially many summands of the form 
& = L(fa;,e@(x)) where the G;’s are subgraphs of G in the sense that the j-th 
hyperedge of G; is a subset of the j-th hyperedge of G. (The exact structure of 
Gi is determined by the Fourier representation of P.) When z is uniformly cho- 
sen, the random variable € is a weighted sum (over the reals) of many dependent 
random variables €;’s. However, if all the subgraphs are good expanders, the 
expectation of each summand €; is zero, and so, by the linearity of expectation, 
the expectation of € is also zero. 

It turns out that when the predicate is 2-resilient the size of each hyperedge 
of G; is at least 3, and therefore if every 3-uniform subgraph of G is a good 
expander fc,p (perfectly) passes all light linear tests. Fortunately, it turns out 
that most graphs G satisfy this property. We emphasize that the argument cru- 
cially relies on the perfect bias of XOR predicates, as there are exponentially 
many summands. (See Section B.I] for full details.) 


2.2 Fooling Heavy Tests 


Consider a heavy test which involves t > k outputs. Switching back to zero-one 
notation, assume that the test outputs the value € = P(ag,) +... + P(zs,) 


(mod 2) where x Ë Un. Our goal is to show that € is close to a fair coin. For 
this it suffices to show that the sum € can be rewritten as the sum (over F2) of 
£ random variables 

€=&4+...+& (mod 2), (1) 


where each random variable €; is an independent non-constant coin, i.e., Pr[é; = 
1] € [272,1 — 279]. In this case, the statistical distance between € and a fair coin 
is exponentially small (in £), and we are done as long as £ is large enough. 

In order to partition €, let us look at the hyperedges $}),...,5; which are 
involved in the test. As a first attempt, let us collect @ distinct “independent” 
hyperedges that do not share a single common variable. Renaming the edges, we 
can write € as 


(P(ar,)+...+ P(xr,)) + (Plts) +... + P(zs,)) (mod 2), 


where the first £ random variables are indeed statistically independent. However, 
the last t — £ hyperedges violate statistical-independence as they may be corre- 
lated with more than one of the first £ hyperdges. This is the case, for example, 
if S; has a non-empty intersection with both T; and T». 
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This problem is fixed by collecting £ “strongly-independent” hyperedges T4, 
...,Z for which every Sj intersects at most a single T;. (Such a large collection 
is likely to exist since t is sufficiently large.) In this case, for any fixing of the 
variables outside the T;’s, the random variable € can be partitioned into £ inde- 
pendent random variables of the form € = P(x7,) + >> P(xs,), where the sum 
ranges over the S;’s which intersects T;. This property (which is a relaxation of 
Eq. [I still suffices to achieve our goal, as long as the €;’s are non-constant. 

To prove the latter, we rely on the fact that P has algebraic degree 2. Specif- 
ically, let us assume that S; and T; have no more than a single common input 
node. (This condition can be typically met at the expense of throwing a small 
number of the T;’s.) In this case, the random variable &; = P(#7,) + > P(2s,) 
cannot be constant, as the first summand is a degree 2 polynomial in x7, and 
each of the last summands contain at most a single variable from T;. Hence, é; 
is a non-trivial polynomial whose degree is lower-bounded by 2. This completes 
the argument. Interestingly, non-linearity is used only to prove that the &;’s are 
non-constant. Indeed, linear predicates fail exactly for large tests for which the 
&;’s become fixed due to local cancelations. (See Section [3.2] for details.) 


2.3 Proving Theorem [2] 


When P is a degenerate predicate and G is random, the existence of a linear 
distinguisher follows by standard arguments. The cases of linear or biased P 
are trivial, and the case of bias towards one input was analyzed by Cryan and 
Miltersen. When P is biased towards a pair of inputs, say the first two, we think 
of P as an “approximation” of the parity xı © x2 of its first two inputs. If P 
happened to be the predicate x, ® x2, one could find a short “cycle” of output 
bits that, when XORed together, causes the corresponding input bits to cancel 
out. In general, as long as the outputs along the cycle do not share any additional 
input bits, the output of the test will be biased, with bias exponential in the 
length of the cycle. In Section 4] we show that a random G is likely to have such 
short cycles, and so the corresponding linear test will be biased. 


3 Non-degenerate Predicates Are Hard 


In this section we prove Theorem[I] We follow the outline described in Section] 
and handle light linear tests and heavy linear tests separately. 


3.1 Fooling Light Tests 


In this section we show that if the predicate P is 2-resilient (see definition below) 
and the graph G is a good expander, the function fg,p is k-wise independent, 
and in particular fools linear tests of weight smaller than k. We will need the 
following definitions. 
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Super expansion. Let G be an (m,n, d)-graph. A graph H is (k,a) subgraph of 
G if it can be constructed by choosing £ < k distinct hyperedges of G and for 
each selected hyperedge S; removing some of the nodes while leaving b; > a 
nodes. We say that G is (k,a) super-expander if the hyperedges T = T},...,T¢ 
of every (k,a)-subgraph H of G touch more than b£/2 nodes where b = X. |T;| /2 
is the average cardinality of the hyperedges of H. We say that G is (k, a)-linear 
if the hyperedges of every (k,a)-subgraph of G are linearly independent viewed 
as vectors in F3. 


Fourier coefficients. For a set T C |d], let yr : {+1}" > {+1} be the Parity 
function defined by (z1,..., £4) +» (—1)¥ter**. It is well known that every 


predicate P : {+1}4 — {+1} can be expressed as a convex combination of 
parities, i.e., P(x) = ) rca ar XT (x) where ær € R. The predicate is a-resilient 
if ar is zero for every set T of size smaller or equal to a. 


The following lemma shows that resiliency combined with (k, a)-linearity leads 
to k-wise independence. 
Lemma 1. /f P is (a—1)-resilient and the (m,n, d)-graph G is (k, a)-linear then 
fa,p is k-wise independent generator, i.e., the m r.v.’s (y1,---, Ym) = fa,p(Un) 
are k-wise independent. 


Proof. Fix an £ < k outputs of fg,p, and let S1,..., Sg be the corresponding 
hyperedges. We should show that E,[[[; P(xs,)] = 0. For every x € {0,1}” we 
have: 


£ 4 
J [P&s:) = II 5 arxr(xs,) = 5 [[enxs..., (x), 
i=l i=1 TCANT >a T=(1, To,lT:|2a i 


where S; {K.K} denotes the set {S; Ki,- --, Si, Ke} and Sij denotes the j-th 
entry of the tuple S;. Hence, by the linearity of expectation, it suffices to show 


that 
E iW XSi,T; 0) = 0, 


for every (Tı, .. . , Te) where T; C [d], |T;| > a. (Recall that the ar,’s are constants 
and thus can be ignored.) Observe that |]; Xs; z, (x£) is just a parity function, 
which, by (k, a)-linearity, is non-constant. Since every non-constant parity func- 
tion is balanced (guaranteed to have zero expectation value), the claim follows. 


Next, we show that (k,a)-linearity is implied by super-expansion, and that a 
random graph is likely to be super-expanding. 


Lemma 2. Let d > 3 be a constant. Let A < yn/logn and3<a<d. 


1. Every (An, n, d)-graph which is (k, a)-super-expander is also (k, a)-linear. 
2. A random (An,n,d)-graph is whp an (an/A?, a)-super-expander where a is 
a constant that depends on a, dl 


3 With high probability (whp) means with probability 1 — 0(1) as n gets large. 
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Proof. The proof of the first item parallels the standard relation between lossless- 
expansion and unique/odd-expansion. Let G be a (k, a)-super-expander. Observe 
that if G is not (k,a)-linear then there must be (k, a)-subgraph H whose edges 
sum-up to zero (over F3). We argue that G cannot have such a subgraph. In- 
deed, by counting edges, in each (k,a)-subgraph H the average degree of the 
participating nodes is smaller than 2, and so there exists at least one node which 
participates in a single hyperedge. Hence, the sum of the hyperedges (over F}) 
is non-zero. 

To prove the second item, we calculate the probability that a random (An, n, d)- 
graph fails to be (k, a)-super-expander. First we bound the probability that there 
exists a subgraph H with £ hyperedges and average degree b > a that violates 
expansion. This probability is bounded by 


(2) (un) (6) e 
(ya) 


where Cd,a is a constant which depends on d and a, and the second inequal- 
ity is due to a < b < d. Let us denote the above quantity by pen Aja. By 
a union-bound G fails to be (k,a)-super-expander with probability at most 
J o<e<k Pén,A,a,d- 

Let us fix a > 3, and assume that A < n?/logn and k = an/A? where 
a = 1/(2ca,)* is a constant. Indeed, in this case 


t t 
Be pe N N a a aE 
Pe S d,a Vn S da Togn : 
Observe that for £ = 1,2,3, the quantity pe is o(1), for 4 < £ < 10logn the 
quantity pe < O(1/log?n) and for 10logn < £ < an/A? the quantity py is at 


most O(1/n1°). It follows that each of these three intervals contributes o(1) to 
the overall failure probability. 


By combining the lemmas, we obtain the following corollary. 


Corollary 1. If P is 2-resilient and m = An for constant A, then whp over 
the choice of an (m,n, d)-graph G, the function fg,p is k-wise independent for 
k= Q(n). If A=n‘*, the above holds with k = Q(n'~**). 


By taking €e < 1/4, 2-resiliency suffices for w(./n)-wise independence whp. 
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3.2 Fooling Heavy Tests 


In this section we show that if the predicate P is non-linear and the graph G has 
large sets of “independent” hyperedges, the function fg,p fools linear tests of 
weight larger than k. Formally, we will need the following notion of independence. 


(k, £, b)-independence. Let S be a collection of k distinct hyperedges. A subset 
T CS of £ distinct hyperedges is an (£, b)-independent set of S if the following 
two properties hold: (1) Every pair of hyperedges (T, T’) € T are of distance at 
least 2, namely, for every pair T; AT; € T and S € S, 


TiN S =H or TS = f; 
and (2) For every T; € T and S # T; in S we have 
IT; N S| <b. 


A graph is (k, £, b)-independent if every set of hyperedges of size larger than k 
has an (£, b)-independent set. 

Our key lemma shows that good independence and large algebraic degree 
guarantee resistance against heavy linear tests. 


Lemma 3. IfG is (k, ¢,b)-independent and P has an algebraic degree of at least 
b, then every linear test of size at least k has bias of at most mee 

Proof. Fix some test S = (S1,..., Sp) of size k, and let T = (Ti,...,T7¢) be an 
(£, b)-independence set of S. Fix an arbitrary assignment ø for all the input vari- 
ables which do not participate in any of the T;’s and choose the other variables 
uniformly at random. In this case, we can partition the output of the test y to 
£ summands over £ disjoint blocks of variables, namely 


y= 5 P(xs;) = 5 zi(zr;), 


i€ [k] i€ [2] 


where the sum is over F> and 


z(x7, ) = P(zr,) + 5 P(£s5AT;, os\r;)- 
S:T; #S,SNT; #0 


We need two observations: (1) the random variables z;’s are statistically inde- 
pendent (as each of them depends on a disjoint block of inputs); and (2) the 
r.v. 2; is non-constant and, in fact, it takes each of the two possible values with 
probability at least 272. To prove the latter fact it suffices to show that z;(2) is 
a non-zero polynomial (over F2) of degree at most d. Indeed, recall that z; is the 
sum of the polynomial P(x7,) whose degree is in fb, d], and polynomials of the 
form P(xsnr,,05\r,) whose degree is smaller than b (as |S N T;| < b). Therefore 
the degree of z; is in [b, d]. 

To conclude the proof, we note that the parity of £ independent coins, each 
with expectation in (ô, 1 — ô), has bias of at most 3(1— 26)’. (See, e.g., [27]). 
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We want to show that a random graph is likely to be (k, £, 2)-independent. 


Lemma 4. For every positive e and 6. A random (n'**,n,d)-graph is, whp, 
(n2*+9 n?/2 2) independent. 


Proof. We will need the following claim. Call a hyperedge S b-intersecting if 
there exists another hyperedge S’ in the graph for which |S” N S| > b. We first 
bound the number of b-intersecting hyperedges. 


Claim. Let b be a constant. Then, in a random (m = n!**,n,d)-graph, whp, 
the number of b-intersecting hyperedges is at most n?) -è logn. 


Hence, whp, at most O(n? logn) of the hyperedges are 2-intersecting, and for 
E < 1/4 there are at most o(,/n) such hyperedges. 


Proof (of Claim). Let X be the random variable which counts the number of 
b-intersecting hyperedges. First, we bound the expectation of X by m?d?°/n? = 
d? . n2+2)->. To prove this, it suffices to bound the expected number of pairs 
Si, Sj which b-intersects. Each such pair b-intersects with probability at most 
d? /n?, and so, by linearity of expectation, the expected number of of intersect- 
ing pairs is at most m?d?°/n°. Now, by applying Markov’s inequality, we have 
that Pr|X > eae E[X]] < d?’/logn = o(1), and the claim follows. (A stronger 
concentration can be obtained via a martingale argument.) 


We can now prove Lemma [4] Assume, without loss of generality, that € > 1 
(as if the claim holds for some value of € it also holds for smaller values). First 
observe that, whp, all the input nodes in G have degree at most 2n*. As by 
a multiplicative Chernoff bound, the probability that a single node has larger 
degree is exponentially small in nê. We condition on this event and the event 
that there are no more than r = n” logn 2-intersecting edges. Fix a set of 
k = n?*+° hyperedges. We extract an (¢,2)-independent set by throwing away 
the 2-expanding edges, and then by iteratively inserting an hyperedge T into the 
independent set and removing all the hyperedges S that share with T a common 
node, and the hyperedges which share a node with an edge, that shares a node 
with T. At the beginning we removed at most r edges, and in each iteration 
we remove at most (d2n°)? edges, hence there are at least l > 74-5. > n/? 
hyperedges in the independent set. 


Combining the lemmas together we get: 

Corollary 2. Fix some positive « and 6. If P has an algebraic degree of at 
least 2 and m = n'**, then, whp over the choice of a random (m,n, d)-graph, 
the function fa,p has at most sub-erponential bias (i.e., exp(—Q(n°))) against 
linear tests of size at least n?°+?9, 


By combining Corollaries [] and P| we obtain Theorem [I] 


4 Linear Tests Break Degenerate Predicates 


In this section we prove Theorem B} That is, we show that the assumptions that 
P is non-linear and 2-resilient are necessary for P to be a hard predicate. Clearly 
the assumption that P is non-linear is necessary even when m = n+ 1. 
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When m > Kn for a sufficiently large constant K (depending on d), it follows 
from work of Cryan and Miltersen that if P is not 1-resilient, then for 
any f: {+1}" > {+1}, the output of f is distinguishable from uniform with 
constant advantage by some linear test. When P is 1-resilient but not 2-resilient, 
Mossel, Shpilka, and Trevisan show that f is distinguishable from uniform by a 
polynomial-time algorithm, but not by one that implements a linear test. 

Here we show that if P is not 2-resilient, then the output of fg,p is distin- 
guishable by linear tests with non-negligible advantage with high probability 
over the choice of G. 


Claim. Assume P is unbiased and 1-resilient but |E[P(z)z12z2]| = a > 0. Then 
for every l = o(logn), with probability 1 — (2-2 + dé/n) over the choice of 
G, there exists a linear test that distinguishes the output of f¢,p from random 
with advantage a’. 


Proof. Let H be the directed graph with vertices {1,...,n} where every hyper- 
edge (i1, i2,..., ia) in G induces the edge (i1, i2) in H. 

Let £ be the length of the shortest directed cycle in H and without loss of 
generality assume that this cycle consists of the inputs 1,2,...,Z in that order. 
Let z; be the name of the output that involves inputs 27 and i+1 for i ranging from 
1 to @ (where į is taken modulo £) and S; the corresponding hyperedge. With 
probability at least 1—dé/n, input i does not participate in any hyperedge besides 
Si and S;,, and all other inputs participate in at most one of the hyperedges 
S1,..., Se. 

We now calculate the bias of the linear test that computes z1 ® ... ® ze. 
For simplicity, we will assume that d = 3; larger values of d can be handled 
analogously but the notation is more cumbersome. We will denote the entries in 
Si by i, i+ 1 and i’. Then the fourier expansion of z;(xg,) has the form 


2i(@g,) = at @i41 + Bajey + yogi ay + Lili Ti 


The Fourier expansion of the expression E[z(%s,)...ze(ag,)] can be written as 
a sum of 4° products of different monomials participating in the above terms. 
The only monomial that does not vanish is the one containing all the a-terms, 


namely 
n 
eI, azizi] =o. 


All the other products of monomials contain at least one unique term of the 
form zy, and this causes the expectation to vanish. 

It remains to argue that with high probability @ is not too large. We show 
that with probability 1 — O((4/K)*), H has a directed cycle of length £, as long 
as £ < logy, (n/4). Let X denote the number of directed cycles of length £ in H. 
The number of potential directed cycles of length in H is n(n—1)...(n—€+1) > 
(n — €)°. Each of these occurs uniquely in H with probability 


(Kn)(Kn—1)...(Kn—£4 Way) l = aap) > =e 


n(n — n—1 n 
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Therefore E[X] > (K/4)*. The variance can be upper bounded as follows. The 
number of pairs of cycles of length £ that intersect in i edges is at most ({)n?4-*1, 
and the covariance of the indicators for these cycles is at most (K/n)?°". Adding 
all the covariances up as 7 ranges from 1 to @, it follows that 


Var[X] < ELX] O3 () a < E[X] + xK 


n n 


as long as £ < logs g (n/4). 


5 Small Bias vs. Cryptographic Security for Local 
Functions 


It is not difficult to come up with examples of generators that have (expo- 
nentially) small bias against linear distinguishers but are not cryptographically 
secure. However, we do not know of any such examples of generators that are 
local and have at least linear stretch: To the best of our knowledge, all local 
functions of linear stretch that are known to implement small-biased generators 
could be pseudorandom generators against all polynomial-time adversaries. 

Therefore it may be plausible to conjecture that if P is almost always hard 
against linear adversaries, then P is almost always hard against polynomial- 
time adversaries. While this conjecture cannot be proven without resolving the 
existence of pseudorandom generators, we give evidence in support of it: We 
show that if P is almost always hard against linear adversaries, then fg,p is not 
only small-biased but (1) it is k-wise independent and (2) it cannot be inverted 
by myopic backtracking algorithms. 

First, we observe that for local functions the small-bias property immediately 
implies k-wise independence. (This is in general false for non-local functions.) 


Lemma 5. Let f : {0,1}" — {0,1}™ be a d-local function which is 2~*¢-biased. 
Then it is also k-wise independent. 


Proof. Assume towards a contradiction that f is not k-wise independent. Then, 
there exists a set of k outputs T and a linear distinguisher L for which € = 
|Pr[L(yr) = 1] — Pr[L(u) = 1]| > 0, where y = f(x) for a uniformly random x 
and u is a uniformly random string of length k. Since f is d-local, yr is sampled 
by using fewer than kd bits of randomness and therefore e > 2744, 


Recall that the proof of our main theorem, Theorem [I] establishes k-wise inde- 
pendence as an intermediate step (Section B.I). However, the above lemma is 
stronger in the sense that it holds for every fixed graph and every output length 
including ones that are not covered by the main theorem. 
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By plugging in known results about k-wise independent distributions, it im- 
mediately follows that if a local function is sufficiently small-biased, then it is 
pseudorandom against AC® circuits [15], linear threshold functions over the re- 
als [18], and degree-2 threshold functions over the reals [I9]. 

Attacks on local functions, which are actively studied at the context of algo- 
rithms for constraint-satisfaction problems, appear to be based mainly on “local” 
heuristics (DPLL, message-passing algorithms, random-walk based algorithms) 
or linearization [9]. Hence, it appears that in the context of local functions, the 
small-bias property already covers all “standard” attacks. We support this intu- 
ition by showing that if P is non-degenerate, then the outputs of fg,p are not 
merely min-wise independent, but have a stronger property: Even after reading 
an arbitrary set of t-outputs, the posterior distribution on every set of £ inputs, 
while not uniform, still has large min-entropy. We call this property robustness. 

The notion of robustness was used by Cook et al. to prove that myopic 
backtracking algorithms cannot invert fg,p in polynomial time when m = n. We 
now argue that for fa,p, robustness is almost always a consequence of small bias, 
and conclude that fg,p cannot be inverted by myopic backtracking algorithms 
even when m = n!t*, e < 1/4, as long as P is non-degenerate. (The analysis 
of also applies to some degenerate predicates. ) 


5.1 Robustness and Myopic Backtracking Algorithms 


Robustness. Let f : {0,1}" — {0,1}™”. Let L C [n] be a set of inputs, and 
t,h € |m]. We say that f is (t, L,h)-robust if for every set of outputs T C [m] 
of size t and every string z € {0,1} the following holds. Let x € {0,1}" be a 
uniformly chosen string conditioned on the event f(x)r = z, i.e., the outputs 
which are indexed by T equal to z. Then the random variable zz = (x;)iex has 
min-entropy of h, namely, for every fixed w € {0,1}!4!, Preg = w] < 27”. The 
function is (t, £, h)-robust if it is (t, L, h)-robust for every ¢-size input set L. 

In the full version of this work, we prove that if fg,p is k-wise independent 
with respect to random graph, then it is also robust for shorter output length. 


Lemma 6. Suppose that P is a predicate for which fap : {0,1}” > {0,1}™ is 
k-wise independent, whp over the choice of a random (m,n, d) graph G. Then, 
whp over the choice of a random (m —r,n,d) graph H, the function frp : 
{0,1}" > {0,1}™~" is (t,£, h)-robust, where h = min (2, r- (L/n)!/2,k — t). 


In the case of linear stretch, m = n + O(n), where k is linear as well (Corol- 
lary [I), one can get (t, £, h)-robustness with linear parameters at the expense of 
linear decrease in the output length (e.g., r = m/2). When the output is polyno- 
mial m = n't (for e < 1/4), we get (t, £, h)-robustness for inverse-polynomial 
parameters, again at the expense of a linear decrease in the output length (e.g., 
r=m/2). 

Robustness is especially useful if the actual number of preimages of y = 
fa.p(2) is relatively small compared to 2”. In this case, an algorithm which 
attempts to guess ¢ bits of a preimage x based on t outputs is likely to be wrong 
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(obtain a partial assignment that does not correspond to any preimage of y.) We 
show that in our setting of parameters (when the output length is large) most 
inputs have a small number of siblings under f¢,p (where G is random). The 
proof of the following lemma is given in the full version. 


Lemma 7. Let P be any nonconstant predicate. For every n > 0 there exists a 
constant M such that when m > 2”4n log n, 


Pr [I{2’ | 2! is a preimage of fa,p(«)}| < M] > 1- n. 


y 


Myopic DPLL algorithms. We now show how the simple statistical properties 
proved in the above lemmas yield lower-bounds for DPLL algorithms who attack 
fa,p. The high-level argument is similar to the one used in [3[16] and it is only 
sketched here. Consider the following myopic backtracking DPLL algorithm, 
whose input consists of y = f¢,p(x) where x is uniformly chosen. The algorithm 
is allowed to read the entire graph G, but it reads the values of y in an incremental 
way. Specifically, in each iteration the algorithm adaptively chooses an input 
variable x; and asks to reveal r new output bits of y. Then it guesses the value 
of x; based on its current state and on the output bits that were already revealed 
(including the ones that were revealed in previous iterations). If the algorithm 
reaches a contradiction, i.e., its partial assignment to x is consistent with some 
output it backtracks. 

Suppose that fg,p satisfies Lemmas [6] and [7] Since fg,p is k-wise indepen- 
dent the algorithm does not backtrack in the first k/r steps (as some patrial 
assignment is consistent with every value of k outputs). Since f is (r- £, £, h)- 
robust and the number of siblings of a random z is at most M whp, the partial 
assignment chosen by the algorithm after £ < k steps is likely to be globally 
inconsistent (there are 2” locally consistent assignments while there are only 
M « 2" globally consistent assignments). Hence, with all but negligible proba- 
bility, the algorithm will err during the first @ steps, and therefore will backtrack 
at some point after more than k steps. It can be shown (by standard lower- 
bound on resolution [10J2]) that, for a random graph, the backtracking phase 
takes super-polynomial time. (By plugging in the exact parameters the lower- 
bound is exponential 2?) when m = O(n) or sub-exponential exp(n°) when 
man's.) 
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Abstract. We initiate a study of randomness condensers for sources 
that are efficiently samplable but may depend on the seed of the con- 
denser. That is, we seek functions Cond : {0, 1}” x {0,1}? — {0,1}™ such 
that if we choose a random seed § + {0,1}, and a source X = A(S) 
is generated by a randomized circuit A of size t such that X has min- 
entropy at least k given S, then Cond(X; S) should have min-entropy at 
least some k’ given S. The distinction from the standard notion of ran- 
domness condensers is that the source X may be correlated with the seed 
S (but is restricted to be efficiently samplable). Randomness extractors 
of this type (corresponding to the special case where k’ = m) have been 
implicitly studied in the past (by Trevisan and Vadhan, FOCS ‘00). 
We show that: 

— Unlike extractors, we can have randomness condensers for samplable, 
seed-dependent sources whose computational complexity is smaller 
than the size t of the adversarial sampling algorithm A. Indeed, we 
show that sufficiently strong collision-resistant hash functions are 
seed-dependent condensers that produce outputs with min-entropy 
k' = m — O(logt), i.e. logarithmic entropy deficiency. 

— Randomness condensers suffice for key derivation in many crypto- 
graphic applications: when an adversary has negligible success proba- 
bility (or negligible “squared advantage” [3]) for a uniformly random 
key, we can use instead a key generated by a condenser whose output 
has logarithmic entropy deficiency. 

— Randomness condensers for seed-dependent samplable sources that 
are robust to side information generated by the sampling algorithm 
imply soundness of the Fiat-Shamir Heuristic when applied to any 
constant-round, public-coin interactive proof system. 
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1 Introduction 


Randomness extractors — functions that convert sources of biased and/or cor- 
related bits into almost uniformly distributed bits — have a wide variety of 
applications in cryptography and other parts of theoretical computer science. 
However, to extract randomness from rich models of sources, e.g. sources for 
which we only have a lower bound on their min-entropy (or even sources where 
each bit is mildly unpredictable given the previous ones), deterministic functions 
cannot be randomness extractors [30]. Thus the general definition of randomness 
extractor by Nisan and Zuckerman allows the extractor to be probabilistic 
— the extractor is given a uniformly random seed that it can use as a catalyst 
for extraction. 

The need for a seed, however, is a problem in some applications of randomness 
extractors. First, if the reason for extraction is lack of access to high-quality 
random bits, then we may not have any way to generate the seed [] (In algorithmic 
applications of randomness extractors, it is often possible to try all possible seeds, 
and combine the results obtained for each extractor output. But this does not 
work in most cryptographic applications. Even one bad seed can compromise 
one’s secrets, and thus eliminate security.) Second, even if we can generate a 
uniformly random seed, it is crucial that the weak random source from which 
we extract is independent from the seed. This means that it is problematic 
to generate the seed once and for all (perhaps using an expensive source of 
randomness) in hope that it can be used for all future randomness extractions. 
If there is any chance that the future weak sources can be influenced by the seed, 
then the extractor guarantees will be lost. For example, if the seed is stored in 
some hardware random number generator (RNG) that extracts from physical 
sources of randomness within the computer (e.g. timing of various events), these 
sources may be affected by the internal computations of the RNG itself and thus 
we have correlations between the seed and the sources. 

Such considerations and others have motivated a revival in the study of de- 
terministic extractors over the past decade, i.e. extractors that do not require a 
seed. Since deterministic extraction is impossible for general weak sources of ran- 
domness, this body of work has sought to identify the richest classes of sources 
for which deterministic extraction is possible, and construct explicit extractors 
for those sources. Most of the studied models of such “extractable sources” 
(e.g. bit-fixing sources [9], discrete control sources [26] or multiple independent 
sources [8]) implicitly or explicitly require independence between different por- 
tions of the source. To avoid this, Trevisan and Vadhan [34] suggested study- 
ing the class of samplable sources, sources generated by efficient algorithms, e.g. 
polynomial-sized circuits. They showed that for every t, there exist (non-explicit) 
deterministic extractors for sources generated by circuits of size t, provided that 
the min-entropy of the source is w(log t). Moreover, this result is based on a prob- 
abilistic argument, and can be viewed as giving an explicit seeded extractor that 


1 Actually, using 2-source extractors [BILI], the seed can also be weakly random, but 
it still needs to be independent from the source. 
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works for seed-dependent sources in the following sense. We generate once and 
for all a random seed S for the extractor, then an adversary A of size t generates 
a source X = A(S) (using additional randomness) with the property that X has 
enough min-entropy given S, and our extractor Ext(X; S) produces an output 
that is statistically close to uniform given S. (We remark that [34] also gave 
an explicit and seedless extractor for samplable sources having min-entropy rate 
close to 1 based on some strong complexity assumptions, and subsequent works 
have given explicit and seedless extractors for sources sampled by weaker mod- 
els of computation, such as small-space algorithms [24[25]23] and constant-depth 
circuits [35].) 

A deficiency of the above extractors is that their computational complexity is 
poly(¢) — larger than the complexity of the adversary generating the source. As 
observed in [34], this is inherent. If the adversary has more resources than the 
extractor, then it can randomly generate inputs on which the first few bits of 
the extractor’s output is constant (and this will be a high min-entropy source). 
More precisely, if the adversary’s running time is larger than the extractor’s by 
a factor of t, it can fix roughly logt bits of the output (and generate a source on 
n bits of min-entropy approximately n — log t). 

The starting point for our paper is the observation that the above attack is not 
so bad. If the adversary can only reduce the min-entropy of the extractor’s output 
by a logarithmic number of bits, we have still achieved something very nontrivial 
and useful. Indeed, we will have what is called a randomness condenser [28]29] 
— which takes an n-bit source with at least some k bits of min-entropy and 
outputs an m-bit source with at least some k’ bits of min-entropy. Randomness 
condensers are nontrivial when the output entropy deficiency m — k’ is smaller 
than the input entropy deficiency n — k (otherwise we could condense just by 
truncating the source). They have been extensively studied in the literature as a 
building block towards constructing randomness extractors (starting with [29], 
and continuing in some of the latest extractors [20]), as well as bipartite expander 
graphs [83]7]. 

Here we note that condensers are useful in their own right. If the entropy 
deficiency of the output is at most 8, then any event that occurs with probability 
p under a uniformly random string can occur under the condenser’s output with 
probability at most p' = 2°-p. For example, if p is negligible and £ is logarithmic, 
then p’ is also negligible. 

Motivated by the above, we initiate a study of condensers for samplable sources. 


DEFINING SEED-DEPENDENT RANDOMNESS CONDENSERS. We define a con- 
denser for seed-dependent samplable sources to be a function Cond : {0,1}” x 
{0,1}4 > {0,1}™ with the following property. If S + Ug, and X = A(S) isa 
source with (min-)entropy at least k given S, generated by a randomized circuit 
A of size at most t, then we require that Cond(X; S) should be (close to) a 
source with min-entropy at least k’ given S. We provide a number of variants of 
this definition, using different measures of conditional entropy, and also consider 
the case that A generates side information along with X (to be discussed more 
below). 
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CONDENSERS FROM CR HASHING. We show that sufficiently strong collision- 
resistant hash functions provide good seed-dependent condensers for samplable 
sources. Here the seed is simply a description of a hash function h from the 
family, and Cond(#;h) = h(a). We show that if efficient algorithms can find 
collisions in the hash functions with probability at most 2°/2™, then the con- 
denser output will have min-entropy k’ ~ m — 8 given the seed (for sources of 
min-entropy larger than m). Note that a birthday attack will find collisions with 
probability O(t?/2™) in time t. If time t algorithms cannot do much better, e.g. 
the probability of finding collisions is at most poly(t)/2™, then we can achieve 
entropy deficiency B = O(logt), within a constant factor of the lower bound 
mentioned above. 


CONDENSERS AND KEY DERIVATION. We formalize the applicability of seed- 
dependent condensers to key derivation. Specifically, we consider using the out- 
put of a condenser as a key in a cryptographic application, and show that for 
“unpredictability” applications (where an adversary can win in a security game 
with at most negligible probability), security is preserved if the output entropy 
deficiency 8 is small enough (e.g. logarithmic). For indistinguishability applica- 
tions, we follow [3] and show that security is preserved if the “squared advan- 
tage” is negligible, which can be achieved for a number of applications. These 
results provide the first formal evidence that when seed-dependent sources arise 
in practice [21] security is not immediately compromised. 


CONDENSERS AND FIAT—SHAMIR. We investigate seed-dependent condensers 
for adversaries A(S) that generate some side information Z in addition to X 
(with the requirement that X has min-entropy at least k given S and Z), anal- 
ogously to the notion of average-case extractors introduced by [12]. We observe 
that the most natural generalization of our condenser definition to this setting, 
namely requiring that Cond(X;S) has min-entropy at least k’ given S and Z, is 
impossible to achieve: the adversary A(S) can simply compute Z = Cond(X; S) 
as its side information. However, it seems plausible to have good condensers 
if we provide the side information also as input to the condenser. While this 
may not be feasible in some applications (because we do not know the side in- 
formation), we show that condensers satisfying this definition can be used to 
obtain a sound implementation of the Fiat-Shamir Heuristic for all constant- 
round, public-coin interactive proof systems (ones with statistical soundness), 
and hence show that such protocols cannot be zero knowledge (by connections 
established by Dwork et al. ). This novel connection between the Fiat-Shamir 
Heuristic and randomness condensing is obtained by observing a close relation 
between seed-dependent condensers for samplable sources tolerating side infor- 
mation and some conjectures of Barak, Lindell, and Vadhan [4] (made in the 
study of zero knowledge and Fiat-Shamir). In fact, this connection only requires 
condensers for “leaky sources” — ones that are uniform prior to conditioning on 
the adversary’s side information — and we show that such condensers are also 
necessary for soundness of the Fiat-Shamir Heuristic. It remains an intriguing 
open problem to give a construction of condensers for leaky sources based on 
some more well-studied complexity assumptions. 
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2 Definitions and Preliminaries 


ENTROPY AND STATISTICAL DISTANCE. We start by defining the relevant no- 
tions of entropy that we use, which are min-entropy, collision (also known as 
Renyi) entropy and Shannon entropy. The Shannon entropy and min-entropy 
of a random variable X are defined as H,(X) = Ese x|- logPr[X = z]] and 


def 


H(X) = —log(max, Pr[X = x]). We also define average (aka conditional) 
Shannon entropy and average min-entropy of a random variable X conditioned 
on another random variable Z by H,(X|Z) = “(x,z)-(X,zZ) [— log Pr[X =2|Z =z] 
and H..(X|Z) = —log(Ez.z[ max, Pr[X = 2|Z = z] ]) respectively, where 
čz z denotes the expected value over z + Z. 

The collision probability of a random variable X is defined as Col(X) = 
>>, Pr[X = 2], and the collision entropy of X is H2(X) = log(1/Col(X)). 
It is easy to see that for any X, H.(X) < Ho(X) < Hi(X) and H2(X) < 
2H..(X). We can also define average collision probability and collision en- 
tropy of a random variable X conditioned on another random variable Z by 
Col(X|Z) = E,z[Col(X|Z = z)] and H2(X|Z) = log(1/Col(X|Z)). Once 
again, H.(X|Z) < Ho(X|Z) < Hi(X|Z) and H2(X|Z) < 2H.(X|Z). 

We denote with distp(X,Y) the advantage of a function D in distinguishing 
the random variables X,Y: distp(X,Y) = | Pr[D(X) = 1] — Pr[D(Y) = 1] |. 
The statistical distance between two random variables X,Y is defined by 


SD(X,Y) = ay |Pr[X = z] — Pr[Y =2]| = max distp(X,Y) 


We say that X and Y are e-close if SD(X,Y) < e. We also note that any tuple 
(X, Z) is e-close to (X’, Z) such that H.(X’|Z) > H2(X|Z) — log (1/2), which 
is often much better than bounding H.(X|Z) > 4 - H2(X|Z). 


3 Seed-Dependent Condensers 


We now generalize the notion of a condenser to the seed-dependent setting, in 
which the adversarial sampler A of size t can depend on the seed S. As we 
will see, seed-dependent condensers are useful for important applications such 
as cryptographic key derivation. 


Definition 3.1 (Seed-Dependent Condenser). Let c,d € {1,2,00}. An ef- 
ficient function Cond : {0,1}”" x {0,1}4 — {0,1}™ is a seed-dependent ({H. > 
k] >e [He > k'], t)-condenser if for all probabilistic adversaries A of size at most 
t who take a random seed S + {0,1}? and output (using more coins) a sample 
X + A(S) of entropy H.(X|S) > k, the joint distribution (S,Cond(X;S)) is 
e-close to some (S, R), where He (R|S) > k’. 

The quantity 8 “!m—k’ is called the entropy deficit of the condenser. When 
c= is clear from the context, we say that Cond is a seed-dependent (k >. k’, t)- 
condenser. We omit the reference to € and/or t when € = 0 and/or t = œ, 
respectively. 
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A notion for traditional condensers arises by replacing A in the definition above 
with an unbounded circuit that does not take the seed S' as input. Unlike with 
traditional condensers, seed-dependent condensers require that A be efficient. 
Otherwise, an inefficient A can, by repeatedly evaluating the condenser using 
the seed S, always find a high entropy distribution of inputs that map to a 
low entropy output distribution. Second, while a seed-dependent extractor can 
be defined as a special case of the definition above corresponding to k’ = m, 
Proposition 3.3] below implies that it is impossible to build a (non-trivial) seed- 
dependent extractor. 

The following lemma (see proof in [I3]) will be useful in several of our later 
results. 


Lemma 3.2. Let c € {1,2,co}. Then, 


e “Output (œ > 2-1)”: Ifd > c and Cond is a seed-dependent (([H. 
k] >e [He > k']), t)-condenser, then Cond is also a seed-dependent (((H- 
k] >e [Her > k']), t)-condenser. 


e “Output (2 > œ)”: For any y > 0, if Cond is seed-dependent (([H. > 
k] >e [H2 > k']), t)-condenser, then Cond is also a seed-dependent (([H- > 
k] >+ [Ho > k’ — log(1/7)]),t)-condenser and also a seed-dependent 
(([H. > k] >: [Hæ > k'/2]), t)-condenser. 

e “Input (1 > 2 > œ)”: If c < c” and Cond is seed-dependent (([He 
k] >e [H- > k']), t)-condenser, then Cond is also a seed-dependent (([He” 
>e [H. > k’]), t)-condenser. 


Z 
2 


IV IV 


co 


Thus, it is somewhat preferable (but also the hardest) to build a seed-dependent 
([H2 > k] >e [Hx > k’]) condenser, since it implies (H. > k] >. [He > k’])- 
condenser for any c,c’ € {2,00}. In contrast, it is preferable to base a security 
of a given application on a (IH. > k] >e [H2 > k’])-condenser, since such 
condensers are likely to have slightly better parameters k and k’. 

The following negative result shows that the output entropy deficiency 6 = 
m—k’ must be at least roughly logt to work for samplers computable in time 
t, if the condenser is computable in time significantly less than t. In particular, 
we cannot hope for a seed-dependent extractor (i.e. 6 = 0) that is computable 
in time significantly less than t, generalizing an observation of Trevisan and 
Vadhan [34] about deterministic extractors for samplable sources. 


Proposition 3.3. Let Cond : {0,1}" x {0,1}¢ > {0,1}™ be computable by a 
circuit of size t', and let B € [0,m], c, € (0,1/2). Then for Cond to be a 
(Ho > n- al >: [Hi > m — 8]), t)-condenser for a = [(8 + 1)/(1 — £ — ô)], 
it must be that a > logt — log t — O(log(1/8)) ora > m. 


Note that as £, ô — 0, the ratio between a and 8 approaches 1. Thus, the propo- 
sition says that if we want to decrease the entropy deficiency by any significant 
factor, we must settle for output entropy deficiency 8 ~ a that is at least roughly 
logt. 
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HANDLING SIDE INFORMATION. One can naturally generalize the notion of 
(regular) extractors and condensers to handle some side information Z about 
the source X, yielding the notion of average-case extractors /condensers [12]. 
Formally, the adversarial sampler A produces a pair (X, Z) such that H.(X|Z) > 
k, and one requires that the joint distribution (Z,S,Ext(X;S)) is e-close to 
(Z, S,Um) (for condensers, that (Z, S, Cond(X; S)) is e-close to (Z, S, R) where 
He (R|(S, Z)) > k'). 

However, things become a bit trickier in the seed-dependent case that we in- 
troduce in this work. Naturally, the sampler A now takes the seed S to produce 
the pair (X, Z). Unfortunately, this means that A can now run the condenser 
Cond(X; S) and simply record all or part of this output in the side information Z. 
This still leaves the entropy of X high enough (say, if k is noticeably larger than 
m), but now the output entropy k’ drops to 0. Thus, to make a meaningful but 
satisfiable definition in the case of side information, we will relax the syntax of 
the condenser Cond to also take the side information Z as part of its input. While 
less convenient for some applications, now the previous attack no longer applies, 
since the sampler A(S) has to choose Z before R = Cond((X, Z); S) is derived, 
making it much harder to “correlate” R and Z. Therefore we say that a con- 
denser is a average-case, seed-dependent (H. > k] >: [He > k’], t)-condenser 
if (Z, S, Ext((X, Z); S)) is e-close to (Z, S, R) where S + {0,1}4, (X, Z) + A(S) 
with H.(X|(S,Z)) > k, and He (R\(S,Z) > k’). A formal definition can be 
found in the full version [13]. 

We notice that Lemma 3.2] clearly extends to the average-case setting. Also, 
when Z is empty, this still generalizes the “worst-case” seed-dependent condenser 
from However, the introduction of side information makes the 
notion of seed-dependent condenser very non-trivial to satisfy even when the 
source X is perfectly uniform, but some side information Z = f(X) is “leaked” 
to the attacker. Indeed, we show in Section 6] that this special case of average- 
case condensers (see [Definition 6.1) is exactly what is needed to instantiate the 
Fiat-Shamir heuristic. 

Finally, an equivalent way to think about average-case condensers is to inter- 
pret the output (X, Z) of the sampler as a single (variable-length) source X’, so 
that the condenser is simply applied to X’, but a subset of (known) physical bits 
Z of X’ is leaked to the attacker/distinguisher. 


4 Condensers from Collision Resistance 


In this section we show that a sufficiently strong collision-resistant hash function 
(CRHF) gives a good seed-dependent (but not average-case) ([H2 > k] 0 [H2 > 
k’|) condenser, which also implies non-trivial bounds for other input/output en- 


tropy settings when c,c’ € {2,00}, by 


Definition 4.1. A family of hash function H = {h : {0,1}* > {0,1}™} is 
(t, 6)-collision-resistant if for any (non-uniform) attacker B of size at most t, 
Pr[H(X1) = (X92) AXi x Xo] < 6 where H +H and (X1, X2) — B(H). 
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The proof of the following theorem appears in the full version [13]. 


Theorem 4.2. Fiz any B > 0. If H is a (2t,2°-!/2™)-collision-resistant hash 


function family, then Cond(X; H) a H(x) for H + H is a seed-dependent 


(H2 > m—6+1] > [H2 > m -— §)),t)-condenser with entropy deficit B and 
no error. 

In particular, it is also a seed-dependent (((H > m—6+1] > [He > 
m — f]),t)-condenser and (([H > m-— 8+1] >: [Ho > m— 6 + loge]),t)- 
condenser. 


PARAMETERS. To obtain good entropy deficit 8 as a function on the sampler’s 
complexity t, we need to understand the best possible (2¢, 6)-collision-resistant 
security of H. Clearly, a birthday attack (essentially) implies that 6 = Q(t?/2™), 
since the attacker can pick t random points, evaluate h on them, and hope for 
some collision. Conversely, this bound is tight in the random oracle model, and 
state-of-the-art hash functions more or less assume that the “birthday attack” is 
the only possible attack on a good CRHF design. For example, birthday attacks 
are currently the best known attacks on many popular hash functions, such as 
SHA-256, SHA-512, and the new SHA-3 functions, as well as discrete-log based 
CRHFs over many elliptic curve groups (c.f., [32]). Thus, under such (strong 
but reasonable) assumptions, all the above popular hash functions achieve 6 = 
O(t?/2™), which means that we can set 2871 = O(t?) resulting in 8 = 2logt + 
O(1). More generally, if the best collision-finding attack has success probability 
ô = poly(t)/2™, then 6 = O(log t). 

Corollary 4.3. Assuming the existence of (t, Oe) )-collision-resistant hash 
functions, there exists a seed-dependent (([H2 > m— + 1] > [H2 > m- £8]), t)- 
condenser with entropy deficit B6 = 2logt + O(1) and no error. 

In particular, it is also a seed-dependent (((Hẹ > m—6+1] > [He > 
m — f]),t)-condenser with entropy deficit 8 = 2logt + O(1) and no error, and 
(H > m-6+1) >: [Hao > m -— 8 —log(1/e)]),t)-condenser with entropy 
deficit 8’ = (2logt + log (1/e) + O(1)) and error e. 


AVERAGE-CASE SETTING? Unfortunately, the proof of [Theorem 4.2] does not 
extend to average-case seed-dependent condensers. The problem is that when 
estimating the value Col(H(X, Z)|(H,Z)), one already needs to sample two 
sources X; and Xə corresponding to the same side information Z, which seems 
to be hard. A bit more formally, a natural attempt to define a collision-finding 
adversary B would be to first let (H) run A(#) to produce a tuple (X1, Z1), 
and then run A(H) several more times to try to produce a second tuple (X2, Z2) 
with the hope that Zə = Z,. But this will not be guaranteed to be efficient 
unless Z is very short (e.g., just a few bits). In some sense, the difficulty of 
handling side information might be expected, since we show that average-case 
seed-dependent condensers are enough to instantiate the random oracle in the 
Fiat-Shamir heuristic (see [Section 6), which is a long-standing open problem. 
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5 Application to Key Derivation 


Consider any cryptographic primitive P (e.g., digital signatures, encryption, 
etc.), which uses randomness R € {0,1} to derive its secret (and, public, if 
needed) key(s). Without loss of generality, we can assume that R itself is the 
secret key. In the “ideal” setting, R + {0,1}™ is chosen uniformly at random, 
and the attacker B against P obtains no knowledge about the choice of R, except 
for what is revealed by P. In practice, however, R is not perfectly uniform. For 
example, it may be the output of a system random number generator (RNG) 
that attempts to extract uniform bits from a source of entropy. To guarantee 
security for the widest range of settings, we ask for the key-derivation to be se- 
cure even against seed-dependent, adversarially-manipulated sources. However, 
shows that, at least in general, no extractors exist that work 
for such a strong adversarial model. We therefore turn to seed-dependent con- 
densers, showing that these yield strong positive results about the security of 
key-derivation. 

Towards this, we model the “real” seed-dependent setting as follows. Let 
S + {0,1}4 be a random seed that is chosen and X + A(S) is sampled by an ad- 
versarial sampler A. Finally, the cryptographic primitive P uses R + Cond(X; S) 
as the key. While the above model is the one of greatest most direct practical 
interest, we will actually consider the more general case of average-case condens- 
ing, in which an attacker B against P obtains part of the input to the condenser, 
the side-information Z. The resulting real/ideal settings for deriving the key for 
P are formalized by the procedures Real(A) and Ideal( A): 


Real( A): Ideal( A): 
S + {0,1}4 S + {0,1}4 


(X, Z) = A(S) (X, Z) — A(S) 
R & Cond((X, Z); S) Re {0,1}™ 
Return (R, S, Z) Return (R, S, Z) 


The two procedures are parameterized by a sampler A that on input the seed 
S outputs a pair (X, Z). We assume that the sampler A has size at most t 
and produces a source X of (conditional) min-entropy Ha(X|(S, Z)) > k, for 
some parameters t and k. We call such samplers (t, k)-bounded. Sometimes, to 
emphasize the dependence on the sampler complexity t and source min-entropy 
k, we will refer to the above two settings as the (t, k)-real and (t, k)-ideal models, 
respectively. 

The side information Z naturally models information about the random source 
X that may be leaked to an adversary via a side channel. However, in most or all 
practical scenarios, our assumption that the value of Z is known and available to 
the condenser is unrealistic. Thus, we will also state our results for the analogous 
models without side information, meaning we omit Z in both the real and ideal 
models. 


? For example the Linux RNG folds back into its entropy pool prior outputs [2I]. 
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DEFINING REAL/IDEAL SECURITY. We assume that the security of the cryp- 
tographic primitive P is defined via an interactive game between a probabilistic 
attacker B(s,z) and a probabilistic challenger C(r). Here one should think of s 
and z as particular values of the seed and the side information, respectively, and 
r as a particular value used by the challenger in the key generation algorithm 
of P. We note that C only uses the secret key r and does not directly depend 
on s and z. In particular, in the ideal model, the values s and z are not re- 
ally useful to the actual attacker B, since the key r used by the challenger C is 
chosen completely independently from these values. Still, we include them for 
consistency. 

At the end of the game, C(r) outputs a bit b, where b = 1 indicates that the 
attacker “won the game”. Since C is fixed by the definition of P (e.g., C runs the 
unforgeability game for signature or the semantic security game for encryption, 
etc.), we denote by Dg(r,s,z) the (abstract) distinguisher which simulates the 
entire game between B(s, z) and C(r) and outputs the bit b. We also let 


Adva(r, s,z) = Pr[Dg(r,s, z) = 1] —c 


be the advantage of B(s,z) to win the game against C(r), where c = 0 for 
unpredictability applications (one-way functions, signatures, etc.) and c = 1/2 
for indistinguishability applications (encryption, pseudorandom functions, etc.). 
Thus, Advg(-) € [0, 1] for unpredictability applications and Advg(-) € [—3, 4] 
for indistinguishability applications. When B is clear from the context, we simply 
write Adv(r, s, z). 

In the following security definition for P, we will use the letter T to denote 
the maximum allowable resources of 6, which include all the efficiency measures 
we might care about in the corresponding application, such as the circuit size, 
number of oracle queries, etc. We say that such a B is T-limited. 


Definition 5.1. Given a sampler A and an attacker B, we define their ideal 


advantage A(A, B) 2 | E[Advaz(Ideal(.A))] | . We say that P is (T, 8)-secure 


in the (t,k)-tdeal model if for any (t,k)-bounded sampler A and any T-limited 
attacker B, A(A,B) < 6. Similarly, given A and B, we define their real advan- 
tage A(A, B) i | E[Advp(Real(.A))] |. We say that P is (T’,6’)-secure in the 
(t, k)-real model if for any (t, k)-bounded sampler A and any T'-limited attacker 


B, A(A,B) < ô. 


5.1 Simple Bound for Unpredictability Applications 


As our first attempt, we would like to argue that if P is (T,6)-secure in the 
ideal setting, then P is also (T’,6’)-secure in the real setting, where T’ is not 
much lower than T, and, more importantly, 0’ is not much larger than 6. With 
traditional extractors, this is done by arguing that the derived real key R is 
(statistically) ¢-close to Um, even conditioned on S and Z. This means that 
6’ < 6+. Unfortunately, in the seed-dependent settings it is impossible to 
achieve statistical extraction, as shown by [Proposition 3.3] In this section, we 
observe that is not strictly necessary to argue statistical extraction: if the original 
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ideal security 6 is low enough, a good enough condenser (achievable even in 
the seed-dependent setting) might result in “real” security 6’ not much larger 
than the “ideal” security ô. At least, we show that this intuition is true for 
unpredictability applications (where, recall, Adv(-) > 0) in the following lemma. 


Lemma 5.2. Assume P is some unpredictability application which is (T,6)- 
secure in the (t,k)-ideal model, and Cond is an average-case seed-dependent 
(Hæ > k] >: [Hx > k’J),t)-condenser with entropy deficit 8 = m —k'. Then 
P is (T,6')-secure in the (t, k)-real model, where .6' < ¢+6-2°.. If instead Cond is 
an (non-average-case) seed-dependent (([Ho > k] >. [Hx > k’]),t)-condenser, 
then P is (T,6’)-secure in the (t, k)-real model without side information. 


PARAMETERS. In essence, states that the security 6 degrades ex- 
ponentially with the entropy deficit 8 of our seed-dependent condenser. Recall 
that 6 = O(logt) is the best we can hope for (by [Proposition 3.3); this would 
give a meaningful security guarantee 6’ ~ 6 - poly(t), as long as ô < 1/poly(t). 
For example, for the non-average-case setting, we can combine the bound in 
with the construction from [Corollary 4.3]to show that a O(t?)/2™- 


collision-resistant hash function suffices for real model security. 


5.2 General Bound through Squared Advantage 


The bound of only holds for unpredictability applications, and also 
requires seed-dependent condensers guaranteeing the min-entropy of the ex- 
tracted key R. In this section we show a more general bound which also holds for 
indistinguishability applications, has better dependence on the entropy deficit 
of the condenser, and needs a slightly weaker type of seed-dependent condenser 
for collision entropy. However, the small price we pay for such improvements is 
that we can no longer directly relate the real-security 6’ of our application to its 
ideal security 6. Rather, we use the notion of the squared advantage A2(A, B), 
and will relate A (A, B) to A2(A, B), which will in turn relate 5’ to the “square- 
security” o which we define below. This notion of squared advantage/security 
was implicitly introduced by Barak et al. [3] in the “seed-independent” setting 
(to improve the entropy loss of the Leftover Hash Lemma), who also showed that 
for many important applications the value ø is not “too much worse” than 6 (see 
the full version for more details [I3]). 


Definition 5.3. Given a sampler A and an attacker B, we define their (ideal) 


square advantage A>(A, B) 2 E[Advpz(Ideal(A))?] . We say that P is (T,o)- 
square-secure in the (t,k)-ideal model if for any (t,k)-bounded sampler A and 
any T-limited attacker B, A2(A,B) < o. 


We can now state our improved bound, and then compare it to our previous 
bound from The proof appears in the full version [I3]. 


Lemma 5.4. Assume P any application which is (T,o)-square-secure in the 
(t, k)-ideal model, and Cond is an average-case seed-dependent (([Hao > k] >e 
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[H> > k’]),t)-condenser with entropy deficit 8 = m — k'. Then P is (T, ô')- 
secure in the (t,k)-real model, where .&® < ¢+Vo-28.. If instead Cond is an 
(non-average-case) seed-dependent (([Hx» > k] >: [Ha > k’]), t)-condenser, 
then P is (T,6’)-secure in the (t, k)-real model without side information. 


Using|Corollary 4.3} we obtain a nearly optimal security degradation in the real 


model with no side information: 


Corollary 5.5. Assuming the existence of (t, oO) ) -collision-resistant hash 


functions, if P is (T,a)-square-secure in the (t,m — 2logt + O(1))-ideal model 
with no side information, then using a collision-resistant function as a condenser 
makes P to be (T,0’)-secure in the (t,m—2logt + O(1))-real model with no side 
information, where 6’ < O(t- y/o). 


6 Side-Information and Fiat-Shamir 


One of the earliest and most influential applications of the Random Oracle Model 
in cryptography (predating its formalization by Bellare and Rogaway [5]) was 
to analyze the Fiat-Shamir Heuristic [15]. In the Fiat-Shamir Heuristic, a hash 
function is used to eliminate interaction in constant-round public-coin protocols, 
replacing the verifier’s random challenges with hashes of the transcript so far. If 
the hash function is modeled as a random oracle, then this heuristic is known 
to preserve soundness of the underlying protocol (up to a factor polynomial in 
the number of queries made by the adversary to the random oracle). However, 
there are no natural examples of protocols for which the Fiat-Shamir Heuristic 
has been proven sound when the hash function is implemented by an efficiently 
computable family of functions. 

The original motivation for the Fiat-Shamir Heuristic was as a method to 
convert identification schemes into digital signature schemes, and the method 
gave rise to many efficient digital signature schemes in practice [[5[31/19] (albeit 
with only a proof in the Random Oracle Model). Another compelling motivation 
for understanding the soundness of the Fiat—Shamir Heuristic is its close con- 
nection to the zero-knowledge property of the underlying protocols, as pointed 
out by Dwork, Naor, Reingold, and Stockmeyer [14]. Dwork et al. showed that 
the soundness of the Fiat-Shamir Heuristic on a given protocol is essentially 
equivalent to that protocol not being (auxiliary-input) zero knowledge unless 
the underlying language is in BPP There are many constant-round public-coin 
protocols whose zero knowledge status is a long-standing open problem (e.g. 
ones obtained by starting some underlying basic zero-knowledge protocol and 


3 The forward direction is shown as follows: if there is an efficiently computable family 
of hash functions for which the Fiat-Shamir heuristic is sound, then it is infeasible 
to simulate a verifier that has a random hash function from the family as auxiliary 
input, and obtains its challenges by applying the hash function to the transcript 
so far. Indeed, an efficient simulator would constitute a prover strategy that gener- 
ates accepting proofs for the Fiat-Shamir-collapsed protocol, which would only be 
possible for inputs in the language. 
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applying parallel repetition to make the soundness error negligible). While these 
protocols cannot be black-box zero knowledge (for nontrivial languages) [16], they 
may still be non-black-box (auxiliary-input) zero knowledge. 

Indeed, Barak [2] constructed a constant-round, public-coin (non-black-box) 
zero-knowledge argument system for NP (assuming the existence of collision- 
resistant hash functions), thereby yielding a natural protocol on which the Fiat- 
Shamir heuristic is unsound (for any efficiently computable family of hash func- 
tions). Goldwasser and Kalai extended Barak’s techniques to construct 3- 
message public-coin identification schemes on which the Fiat—Shamir Heuristic 
is unsound. In both of these counterexamples to the Fiat—Shamir Heuristic, the 
initial interactive protocol is only computationally sound, and the results seem 
to use this in an essential way. 

Thus, Barak, Lindell, and Vadhan conjectured that there is a sound im- 
plementation of the Fiat-Shamir Heuristic for any statistically sound interactive 
proof of language membership (and thus that there can be no constant-round 
public-coin zero-knowledge proof system with negligible soundness for a language 
outside BPP). Indeed, they provided a plausible property for a family of hash 
functions that suffices for it to provide a sound implementation of Fiat-Shamir 
on proof systems. While they conjectured that such hash families exist, it re- 
mains open to construct one based on a standard complexity assumption. 

The significance of statistical soundness for reducing interaction was further 
highlighted by the recent work of Kalai and Raz [22], who showed that a method 
proposed by Aiello et al. [I] (based on Private Information Retrieval) can be 
used to convert (statistically sound) interactive proofs into 2-message argument 
systems. However, this construction does not subsume Fiat-Shamir, because the 
2-message argument system it produces is private coin (so the verifier’s first 
message cannot be published as a CRS and shared by all verifiers, as needed for 
the application to digital signatures) and it does not have the connection to zero 
knowledge mentioned above. 

Here we show that condensers for seed-dependent samplable sources that can 
handle side information (i.e. average-case condensers) imply hash functions for 
which the Fiat-Shamir Heuristic is sound for proof systems. In fact, we only 
require condensers for the case that the initial source X is uniform and the 
adversary’s side-information Z consists of a bounded-length “leakage” f(X,S) 
on the source and seed, for an efficiently computable leakage function f. We also 
show a partial converse — some form of such condensers are also necessary for 
the Fiat—Shamir heuristic to be sound for all proof systems. 

Our results are inspired by a similarity between the definition of condensers for 
samplable sources and the aforementioned conjectures of Barak et al. [4]. While 
the existence of such condensers and hash functions remains an open problem, the 
connection between randomness condensing and the Fiat-Shamir Heuristic, along 
with our construction of condensers without side information (Theorem 4.2), seem 
to yield a clearer picture of what is needed for the Fiat-Shamir Heuristic to work. 
(In particular, we find the definition of a seed-dependent average-case condenser 
more natural than the conjectures in [4].) 
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We begin by defining the restricted form of average-case condensers that we 
relate to the Fiat-Shamir heuristic: 


Definition 6.1 (Condensers for Leaky Sources). Let c, € {1,2,00}. An 
efficient function Cond : {0,1}" x {0,1}* > {0,1}™ is an (e, | He > k’],t)- 
condenser for leaky sources if for all probabilistic adversaries A of size at most 
t who take a random source X <4 {0,1}" and output a string Z := A(X) 
of length a, the joint distribution (Z,Cond(X, Z)) is e-close to (Z, R), where 
He (R|Z) > K. 

When £ = 0, we will refer to Cond as an (He > k’],t)-condenser for leaky 
sources. The quantity B “m —k’ is called the entropy deficit of the condenser. 


Thus, instead of allowing an arbitrary efficiently samplable source X that has 
high entropy given the adversary’s side information Z, we restrict to X + {0,1}” 
and Z of bounded length a. For natural measures of conditional entropy, this 
implies that H(X|Z) > n—a, so an average-case condenser for entropy k = n-a 
is also condenser for leaky sources according to [Definition 6.1] Note that in the 
case of leaky sources, we do not provide the condenser with a seed; that is 
because any seed can be viewed as part of the uniformly random source X. 
Indeed, average-case condensers with seeds imply seedless condensers for leaky 
sources; further discussion and formal results are in the full version [13]. 

Now we define the Fiat-Shamir heuristic more precisely. Let (P,V) be a 
public-coin interactive protocol, where the parties receive no inputs (except a 
security parameter K), there are 2r + 1 messages exchanged starting with P. We 
denote the lengths of P’s messages by £ = (K) and the lengths of V’s messages 
by m = m(k). 


Definition 6.2. For a language L = L(K) C {0,1}*, we say that (P,V) is a 
(t, €)-sound interactive argument for L iff there is no prover strategy P* of circuit 
size at most t that convinces V to accept on a transcript whose first message is 
not in L with probability greater than e. 

We say that (P,V) is an e-sound interactive proof for L iff it is an (0o0,€) 
interactive argument for L (i.e. it holds for computationally unbounded prover 
strategies P* ). 


Ordinarily, interactive proofs are formulated with the input x (whose member- 
ship in L is being determined) being provided separately as a common input 
to P and V. However, incorporating x into the first message of the protocol is 
notationally more convenient for us. 

Fiat and Shamir [15] suggested a way to remove the interaction from protocols 
as above, by replacing the verifier’s messages with hashes of the transcript: 


Definition 6.3. For an interactive protocol (P,V) as above, a = r-£+(r—1)-m, 
and a family of hash functions H = H(K) = {h : {0,1}% — {0,1}}, the Fiat- 
Shamir collapse of (P, V) using H is the 2-message public-coin protocol (P', V") 
defined as follows: 

(1) V’ sends P’ a random hash function H + H, 
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P' sends V' a tuple (Mı, Mo,...,M,+41) E ({0,1}4)"*1, 
V’ accepts iff V accepts on the transcript (M1, Ri, M2, Ro,...,M,, Rr, Mr+1) 
where R; = A(M,, Ri,...,Mi-1, Ri-1, Mi) for each i € [r]. 


say that the Fiat-Shamir heuristic using H is (t,¢’)-sound on (P,V) iff 
(P',V’) is a (t,e’)-sound interactive argument for the language L’ = {(M,..., 
Mrat) : Mı € L}. 


—~ — 
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Now we prove that we can use condensers for leaky sources to construct hash 
functions for which the Fiat-Shamir heuristic is secure: 


Theorem 6.4. Let (P,V) be an interactive protocol as above, and let a= r- £+ 
(r—1)-m. Given Cond : {0,1}” x {0, 1}% > {0,1}™, define H = {hz : {0,1}% > 
{0,1} }aesoryn by ha(z) = Cond(a, z). 

Then if (P,V) is an €,-sound interactive proof for some language L and Cond 
is an (€9,[Ho > m — 8], t)-condenser for leaky sources, then the Fiat-Shamir 
heuristic is (t’,e’)-sound on (P,V), for t =t — (r — 1) - tcond — O(n) and 

are —] 


eg = 2? eg) 4 oF -E2 < OP (e1 5), 


For intuition about the parameters, consider the standard, polynomial-time 
asymptotic setting. Here all length parameters of the proof system (£, m) are 
some fixed polynomial in the security parameter «, and we are interested in 
protocols whose soundness error £; is negligible, i.e. e1 = « “%"). We focus on 
constant-round proof systems, so r = O(1). We take the length n = poly(k) 
of the condenser source to be significantly larger than m +a = r- (L+ m). 
This means that the condenser should work for sources with entropy at least 
k = n — a, which is significantly larger than m. By analogy with [Theorem 4.2] 
we can hope for the output to have min-entropy deficiency 3 = O(log t), which 
is O(log x) for any polynomial t = t(«), possibly with some negligible statistical 
difference ez = x“). Thus the new soundness error satisfies 


ef <P. (e1 + e2) = 2008") . (Ke 4 WD) = ge), 


which is still negligible. 

For intuition about the proof, consider a cheating prover strategy, that given 
the description X of a random hash function from the family, tries to construct 
a transcript (Mı, Ri,...,M,, Rr, Mr+1) such that Mı ¢ L, the original verifier 
accepts, and each R; is the hash of the prefix preceding it, i.e. 


Ri = hx(Mı, Ri, ga ., Mi) = Cond(X, (Mı, Ri, oa ., Mi)). 


Viewing Z; = (Mı, Rı,..., Mi) as the adversary’s side information (which is of 
length at most r-4+(r—1)-m), the condenser property says that R; is €2-close to 
having min-entropy deficiency at most § given the prefix Mj, Ri,...,M;. Com- 
pared to R; being uniform and independent of the prefix, this should increase 
the soundness error by an additive ¢2 and a multiplicative 2°. Incurring this 
blow up for each of the rounds i yields the bound in the theorem. The formal 
proof is given in the full version [13]. 
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Many interactive proofs of interest have only three messages (i.e. r = 1 above) 
and have optimal soundness €, = 1/2”, meaning that for every initial prover 
message not in L, there is at most 1 verifier challenge that can lead to an ac- 
cepting transcript. Examples include parallel repetitions of Blum’s Hamiltonicity 
protocol [6], the Goldwasser-Micali-Rackoff Quadratic Residuosity Protocol (to 
which Fiat-Shamir was originally applied) [I8], and any X protocol [IO]. Setting 
r=1ande,; = 1/2™, we see that the resulting soundness error is €’ = 2° /2™+e3, 
which is small even for entropy deficiency 6 that is quite close to m, i.e. the out- 
put entropy of the condenser need only be k’ = m — 8 = log(1/e3) to achieve 
soundness error £2 + €3: 


Corollary 6.5. Let Cond, H, and (P,V) be as in|Theorem 6.4, Suppose further 


that (P,V) has 3 messages (i.e. r = 1), and has soundness cı = 1/2™, where m 
is the length of the verifier’s challenge. 
Then if Cond is a (€2, [Hoo > log(1/é3)], t)-condenser for leaky sources com- 
putable in time tcona, it follows that the Fiat-Shamir heuristic is (t',e’)-sound 
n (P, V), for t =t— O(n) and €' = £2 + £3. 


[Theorem 6.4] and [Corollary 6.5]are stated using average min-entropy as the en- 
tropy measure for the output of the condenser. We now discuss their extensions 
to other entropy measures. 

If the condenser output is only guaranteed to have high collision entropy 
given the seed and the adversary’s side information, we can deduce that it is 
statistically close to having high average-min-entropy. Indeed, if H2(A|B) > k, 
then for every y > 0, (A, B) is y-close to some (A’, B) such that H2(A’|B) > 
k—log(1/y). Thus we can switch from min-entropy to collision entropy at a price 
of increasing the entropy deficiency by at most log(1/y) and increasing £ by at 
most y. 

If the condenser output is only guaranteed to have high Shannon entropy, we 
can only deduce that the Fiat-Shamir Heuristic has soundness error bounded 
by a constant. This is still quite nontrivial, and indeed the soundness error can 
be made negligible without adding interaction by repeating the heuristic with 
several independent hash functions. This case (obtaining constant error using 
condensers for Shannon entropy) actually follows from the results in [4] and 
the connection between condensers for leaky sources and the conjectures in [4]. 
Moreover, in the full version [13], we give a converse, that soundness of the 
Fiat-Shamir transform implies the existence of condensers for leaky sources. 
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Abstract. Verifiable random functions (VRFs) are pseudorandom func- 
tions with the additional property that the owner of the seed SK can 
issue publicly-verifiable proofs for the statements “f(SK,x) = y”, for 
any input x. Moreover, the output of VRFs is guaranteed to be unique, 
which means that y = f(SK,x) is the only image that can be proven 
to map to x. Despite their popularity, constructing VRFs seems to be a 
challenging task and only a few constructions based on specific number- 
theoretic problems are known. Basing a scheme on general assumptions 
is still an open problem. Towards this direction, Brakerski et al. showed 
that verifiable random functions cannot be constructed from one-way 
permutations in a black-box way. 

In this paper we continue the study of the relationship between VRFs 
and well-established cryptographic primitives. Our main result is a sepa- 
ration of VRFs and adaptive trapdoor permutations (ATDPs) in a black- 
box manner. This result sheds light on the nature of VRFs and is inter- 
esting for at least three reasons: 


— First, the separation result of Brakerski et al. gives the impression 
that VRFs belong to the “public-key world”, and thus their rela- 
tionship with other public-key primitives is interesting. Our result, 
however, shows that VRFs are strictly stronger and cannot be con- 
structed (in a black-box way) form primitives like e.g., public-key en- 
cryption (even CCA-secure), oblivious transfer, and key-agreement. 

— Second, the notion of VRFs is closely related to weak verifiable ran- 
dom functions and verifiable pseudorandom generators which are 
both implied by TDPs. Dwork and Naor (FOCS 2000) asked whether 
there are transformation between the verifiable primitives similar to 
the case of “regular” PRFs and PRGs. Here, we give a negative 
answer to this problem showing that the case of verifiable random 
functions is essentially different. 

— Finally, our result also shows that unique signatures cannot be in- 
stantiated from ATDPs. While it is well known that standard sig- 
nature schemes are equivalent to OWFs, we essentially show that 
the uniqueness property is crucial to change the relations between 
primitives. 
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1 Introduction 


Verifiable random functions (VRF) were introduced by Micali, Rabin, and Vad- 
han [I]. VRFs are random functions with the additional property that they 
provide a proof verifying the input-output relationships. Formally, a VRF is 
defined by a key pair (SK, PK) such that: the secret seed SK allows the evalu- 
ation of the function y#F'(SK, x) on any input x and the generation of a proof 
m. This proof is publicly verifiable i.e., given the public key PK one can effi- 
ciently verify (using 7) that the statement “F(SK,x) = y” holds. For security, 
VRFs must satisfy two properties: pseudorandomness and uniqueness. Roughly 
speaking, pseudorandomness states that the function looks random at any input 
x for which no proof has been issued. Uniqueness guarantees that for any z, 
there exists only one image y for which a valid proof can be produced (even for 
maliciously chosen public keys). 

In some sense a VRF can be seen as the public-key equivalent of a pseudoran- 
dom function. This fascinating primitive has many applications, both theoretical 
and practical: 3-rounds resettable zero-knowledge [2], non-interactive lottery sys- 
tems and micropayment schemes [8], a verifiable transaction escrow scheme [4], 
and updatable zero-knowledge sets [5]. However, despite their popularity, con- 
structing VRFs seems to be challenging. In particular, only a few schemes are 
known so far, e.g., (see Section for a brief description of these 
works). Furthermore, all known schemes are based on specific number-theoretic 
problems such as RSA or different assumptions relying on bilinear maps. Con- 
structing a VRF based on general assumptions is still an open problem. 

In modern cryptography, almost all cryptographic primitives base their security 
on unproven computational assumptions that are considered reasonable by the 
community]. In particular, the existence of one-way functions (OWF) is one of 
the major open problems in cryptography. A common methodology for proving 
the security of a cryptographic primitive, and for better understanding its relation 
to other primitives, are black-box reduction techniques that can be described as 
follows. Let P and Q be two primitives. A construction of P from Q is black-box if 
the primitive P has only oracle access to Q (i.e., P does not have access to the code 
of this primitive, but can evaluate it). A security reduction of P to Q is black-box if 
for any (efficient) adversary A that breaks P there exists an (efficient) algorithm S 
that has black-box access to A and breaks Q. This approach has been extensively 
formalized by Reingold et al. who gave different “flavors” of black-box reductions 
depending on the “degree” of black-box access [II]. 

Black-box constructions and black-box proofs give clearly a limited view on 
the relation between the different primitives as no conclusions beyond the black- 
box access can be made. Nevertheless, the approach is well established as most 
of the cryptographic proofs are black-box and it is strong enough to show that 
many cryptographic primitives, such as pseudorandom functions, digital signa- 
tures, private-key encryption, are equivalent to the existence of one-way functions 


1 If one makes exception of a few cases that are proven secure in an information- 
theoretic sense. 
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(OWFs), which is considered to be one of the most basic assumptions. On the 
other hand, other primitives (e.g., public-key encryption) are believed to exist 
only under stronger assumptions (e.g., the existence of trapdoor permutations). 
Though such primitives and/or assumptions look different, it might be possible 
that many of them are related or even equivalent. Therefore, identifying the min- 
imal assumptions on which one can base the security of a primitive is considered 
one of the most important goals for a better and deeper understanding of the 
cryptography world. 

On the negative side, Impagliazzo and Rudich introduced a methodology for 
proving separations between primitives in the sense of black-box constructions, 
e.g., proving that Q does not imply P in a black-box way [I2]. In their work 
they ruled out any black-box construction of key-agreement protocols (KA) 
from one-way functions. Gertner et al. show that the breakthrough result of 
Impagliazzo and Rudich can be seen as defining two separated worlds in which 
the cryptographic primitives can be divided: the “private cryptography” world 
that contains all those primitives that are equivalent to OWFs, and private-key 
encryption; the “public cryptography” world that contains harder primitives 
such as trapdoor permutations, public-key encryption (PKE), KA and oblivious 
transfer (OT) [13]. 

It is worth to mention that another methodology, called meta-reductions, for 
separating primitives in a black-box sense is known. Since we do not follow this 
approach, we refer the reader to e.g., [A506]. 


1.1 Our Results 


We investigate the relationship between verifiable random functions and well- 
studied cryptographic primitives. The first step towards this goal was recently 
given by Brakerski, Goldwasser, Rothblum, and Vaikuntanathan who separated 
VRFs from one-way permutations [I7]. The authors introduce the notion of weak 
verifiable random functions (wVRFs) that can be seen as the public key ana- 
logue to weak-PRFs: pseudorandomness only holds with respect to randomly 
chosen inputs. Moreover, they construct wVRFs from (enhanced) trapdoor per- 
mutations and show that wVRFs are essentially equivalent to non-interactive 
zero knowledge proof (NIZK) systems in the common reference string model. 
In the private key setting, it is well known that “regular” PRFs can be con- 
structed from weak PRFs in a black-box way [1819]. Thus, a natural direc- 
tion to study the relation between the primitives is to build a VRF out of any 
wVRF. 

Another work that is closely related to this topic is the study of verifiable pseu- 
dorandom generators (VPRGs) due to Dwork and Naor [20]. Roughly speaking, 
a VPRG is a pseudorandom generator that allows the owner of the seed to prove 
the correctness of subsets of the generated bits while the other bits remain indis- 
tinguishable from random. Dwork and Naor constructed VPRGs from trapdoor 
permutations. Again, in the case of “regular” PRFs we know how to turn a PRG 
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into a PRF in a black-box way [21]. Dwork and Naor left open the question if a 
similar transformation can be found in the public key setting [20], namely: 


Is it possible to construct a VRF from VPRGs and/or weak-VRFs in a 
black-box way? 


In this paper, we give a negative answer to this question and, more generally, 
we show that no black-box constructions of VRFs from (enhanced) trapdoor 
permutations exist. 


Theorem 1 (informal). There exists no black-box reduction of verifiable ran- 
dom functions to trapdoor permutations. 


Our result is actually more general than the above indicates; it separates the 
weaker primitive of verifiable unpredictable functions (VUFs) from the stronger 
primitive of adaptive trapdoor functions. The difference between VRFs and 
VUFs is that in the latter the output should be unpredictable instead of pseu- 
dorandom. Therefore, VUFs can also be seen as “unique signatures” , where, for 
every public key, each message can have at most one valid OPER dd 

Adaptive trapdoor functions (ATDFs), recently introduced by Kiltz, Mohas- 
sel, and O’Neill in [22], are essentially strictly stronger than trapdoor functions 
as the adversary is given access to an inversion oracle. 


Implications of Our Result. Our result sheds light on the nature of VRFs 
and explains why this primitive seems so hard to construct. First, given the 
separation result of Brakerski et al., one can naturally think of VRFs as though 
they belong to the “public cryptography” world. Then, if we consider the rela- 
tionship between VRFs and the other public-key primitives, our result highlights 
that VRFs are much stronger as they cannot be implied by most of the primitives 
in this world: basically everything which is implied by TDPs, e.g. semantically- 
secure public-key encryption, oblivious transfer, key-agreement. Moreover, since 
ATDPs imply CCA-secure PKE [22], then VRFs are separated even from it. 
On the positive side we observe that we can obtain a construction of VRFs 
from identity-based encryption with unique key derivation following the idea of 
Abdalla et al. ofl. Combining this positive result with our impossibility result 
confirms the impossibility result of IBE from TDPs [23]. 

Second, our result points out the hardness of achieving the uniqueness prop- 
erty in the context of digital signatures: While signature schemes are equivalent 
to OWFs, unique signatures cannot be instantiated from (adaptive) TDPs in a 
black-box way. 

Finally, since both weak-VRFs and VPRGs are implied by TDPs, our result 
rules out the possibility of constructing VRFs from weak-VRFs and/or VPRGs 


? At this stage, it is interesting to observe unique and deterministic signatures are 
two distinct primitives. Consider for example the signature o = o’||0 where ø’ is 
deterministic and the verification algorithm ignores the last bit. Then it is obvious 
that uniqueness could be easily violated by flipping the last bit. 

3 Precisely, the unique key derivation algorithm immediately implies a VUF, which 
can then be turned into a VRF using the original idea of Micali, Rabin and Vadhan. 
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(in a black-box way). Thus, it seems that there is no hope that the approaches 
used in the private key world to build PRFs from weak-PRFs and PRGs can 
be adopted to the case of the public verifiable primitives. This shows that the 
verifiable analogous of these primitives are essentially different. 


1.2 Overview of the Techniques 


Our starting point is the so-called “two oracles” technique of Hsiao and Reyzin 
[24]. The main idea of this technique is to construct two oracles, say O and B, 
such that O is used in the constructions, whereas both oracles O and B can 
be accessed by the adversaries. This approach is slightly weaker than the single 
oracle technique because it “only” rules out fully-black-box reductions (instead 
of any black-box reduction). 


Our Oracles. In our case the oracle O is an ideal random trapdoor permutation 
oracle that is modeled as a triple of random functions (g,e,d) such that: g(-) 
maps trapdoors to public keys; e(ek, -) is a random permutation for every public 
key ek and d(td,-) is the inverse of e(ek,-) when g(td) = ek. Due to the fact 
that O is truly random, © is secure even in the sense of adaptive trapdoor 
permutations. The oracle B is designed to break any black-box construction of 
VUF based on O. 

Therefore, the core of our separation theorem is the definition of the weakening 
oracle 6. The proof then consists of two main parts: 


(i) showing an efficient adversary that can break the unpredictability of the 
VUF by making a polynomial number of queries to B; 

(ii) showing an ATDP construction that is secure against any adversary that 
makes at most polynomially-many oracle queries. 


The design of B is rather technical. In particular, the main difficulty is to prevent 
an attacker from exploiting B to break the one-wayness of the ATDP. A naive 
construction would be an oracle that takes as input a VUF public key and returns 
y*<F (SK, «*), i.e., the evaluation of the function on a random point z*. This 
oracle would clearly break the unpredictability of the VUF, but it would also 
be too strong. Consider, for instance, an adversary A that is given as input a 
public key ek* of a trapdoor permutation and that is challenged to invert it on 
a random point b*. Now, A might encode (ek*,b*) into PK in such a way that 
the evaluation of F(.SK,x*) requires to invert b*. But then the attacker would 
learn all informations about 6*’s inverse. To prevent these “dangerous” queries 
we modify 6 such that it takes as input a certain number of triples (£4, Yi, Ti), 
where 7; is a valid proof for “F (SK, x) = y;”. The idea follows from the intuition 
that the attacker can encode b* (and ek*) into PK in only two ways: 


(i) F(SK,-) needs to invert b* on a large fraction of the inputs, 
(ii) F(SK,-) needs to invert b* only on a negligible fraction of the inputs. 


Now, suppose that A encodes b* into PK as defined in the first case. In order 
to query the oracle, A has to provide valid proofs. But if A can compute all 
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proofs, then the attacker must already know b*’s inverse. Otherwise, if b* is 
encoded into PK as described in the second case, then the probability that 
evaluating F(SK,«x*) on a random input 2* requires to invert b* is negligible. 
Hence, returning y* does not reveal any useful informations to A. Although this 
idea seems very promising, it raises another issue. In fact A might overcome this 
limitation by choosing all the x;’s from the small fraction that does not require 
to invert b*. We solve this issue by defining a two-steps oracle B = (B1, B2) such 
that Bı chooses the values x;’s and By is the actual oracle as described above, 
such that it works properly only if the inputs x;’s are chosen by By. 

Finally, an important detail towards the definition of B is that it simulates 
the run of F°(SK,2x*) using a different oracle O’ and a different secret key 
SK’ such that SK" still corresponds to PK under 0’. The idea is that, if O’ is 
close enough to © (as it should be the case while trying to break the VUF), then 
evaluating F?’ (SK', x*) produces the same output as F°(SK,«*). On the other 
hand, with high probability O and O’ are not close when an ATDP adversary 
invokes $. 


1.3 Other Related Work 


Verifiable Random Functions. Goldwasser and Ostrovsky introduce the no- 
tion of unique signatures (calling them invariant signatures) and they show that 
in the common random string model they are equivalent to non-interactive zero- 
knowledge proofs [25]. Later, Micali, Rabin and Vadhan formally define VRFs 
and propose a construction (in the plain model) [I]. The authors follow two main 
steps: (1) they construct a verifiable unpredictable function (VUF) based on the 
RSA problem and then (2) they show a generic transformation to convert a VUF 
into a VRF using the Goldreich-Levin theorem [26] (that extracts one random 
bit from polynomially-many unpredictable bits). The hope of this two-steps ap- 
proach is that a VUF should be easier to realize than a VRF, but the second 
step is very inefficient. Finally, Lysyanskaya proposes a VUF relying on a strong 
version of the Diffie-Hellman assumption [6]. 

The subsequent works suggest direct and (more) efficient constructions of 
VRFs without relying on the Goldreich-Levin transformation. Dodis suggests 
an instantiation on the sum-free generalized DDH assumption [7], and Dodis 
and Yampolskiy give a construction based on the bilinear Diffie-Hellman inver- 
sion assumption [8]. Abdalla, Catalano, and Fiore show the relationship between 
VRFs and a certain class of identity-based encryption schemes [9]. Moreover, the 
authors propose a construction based on the weak bilinear Diffie-Hellman inver- 
sion assumption. All the schemes mentioned so far share the limitation of sup- 
porting only a small domain (i.e., of superpolynomial size). The only exception 
is the recent scheme by Hohenberger and Waters, who give the first construction 
having a large input space [10]. Another closely related work is one of Dodis and 
Puniya who construct NIZK from verifiable random permutations (VRPs), that 
are the verifiable analog of pseudorandom permutations [27]. The author also 
show how to convert a VRF into a VRP. 
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Black-Box Separations. After the seminal result of Impagliazzo and Rudich 
many follow up works studied the relation between different primitives, such as, 


e.g., {13]28]29/30[23/23)31/32|. We discuss these works in the full version [14]. 
2 Preliminaries 


Adaptive Trapdoor Permutations. Adaptive trapdoor permutations (AT- 
DPs) are defined similar to a trapdoor permutation, but in the security definition 
the adversary is provided with an oracle that inverts the function on arbitrary 
images (except on the challenge value). A formal definition is given in ; 


Verifiable Random Functions. Verifiable random functions (VRF) are similar 
to pseudorandom functions, but differ in two main aspects: Firstly, the output of 
the function is publicly verifiable, i.e., there exists an algorithm J that returns 
a proof m which shows that y is the output of the function on input x. Secondly, 
the output of the function is unique, i.e., no two images (and proofs) exist that 
verify under the same preimage. 


Definition 1 (Verifiable Random Functions). A family of functions F = 
{fs : {0,1 ™® > {0,1} } cro azsccacay is a family of Verifiable Random Func- 
tions if there exists a tuple of algorithms (KG, F, IH, V) with the following func- 
tionalities: 


KG(1`) outputs a pair of keys (PK, SK). 

F(SK,x) is a deterministic algorithm that evaluates f(x). 

I(SK,x) is an algorithm that outputs a proof n related to x. 

V(PK,2,y,7) outputs 1 ifr is a valid proof for “fs(x) = y”, else it outputs 0. 


A tuple (KG, F, II, V) is said to be a VRF if it satisfies the following properties: 


Domain Range Correctness For all values x € {0, iyo"), over the choices 
of (PK, SK), we have that F(SK,x) € {0,1} holds with all but negligible 
probability. 

Completeness For all x € {0,1}"°) if IT(SK,x) = and F(SK,x) = y then 
V(PK,2,y,7) outputs 1 with overwhelming probability (over the choices of 
(PK, SK) and the coin tosses of V). 

Uniqueness There exist no values (PK, x, yi, Y2, n1, 72), unless with negligible 
probability over the coin tosses of V, such that for distinct yı and y2 it holds 
that V(PK,«,y1,7) = V(PK, £, yo, 72) = 1. 

Pseudorandomness For all PPT adversaries A = (A1, A2) we require that 
the probability A succeeds in the experiment pseudo”, is at most 4 + negl(A), 
where the experiment is defined in Figure {I 


Verifiable unpredictable functions (VUF) are similar to VRFs, except that un- 
predictability must hold instead of pseudorandomness: 


Definition 2 (Verifiable Unpredictable Functions). A tuple (KG, F, IT, V) 
is a verifiable unpredictable function if the probability that any PPT adversary A 
succeeds in the experiment predict’; , defined in Figure[] is at most negligible. 
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Experiment pseudo”, Experiment predict/, 
(PK, SK)<-KG(1*); (PK, SK)<-KG(1*); 
(w*, state) AT “"°O*) (PK) (2*, y") AF (PK) 
b È {0,1}; Output 1 iff y* = F(SK,x*) and 


x* was not asked 


z s m(A) 
Ube P OK rk yee Di to the Func(SK,-) oracle. 


bf Arure(S®) (state, yp) 
Output 1 iff b’ = b 

and x* was not asked 

to the Func(SK,-) oracle. 


Fig. 1. This Figure show the experiment of pseudorandomness and unpredictability. In 
both experiments the oracle Func(SK,-) computes F(SK,-) and I[(SK,-) and returns 
their output. 


3 The Black-Box Separation 


We first give a high-level overview of the main ideas of our proof before going 
into the details afterwards. Our starting point is the “two oracles” separation 
technique of Hsiao and Reyzin [24]. In the context of VRFs, we have to construct 
two oracles O and B relative to which ATDPs exist while VUFs do not. In 
particular, the constructions are restricted to have black-box access only to O, 
while the adversary may access both O and B. 

The core of our separation are the two oracles, O and B. The oracle O = (g, e, d) 
realizes a random trapdoor permutation (we give a formal definition in Section 
[3.2). The second oracle is a weakening oracle such that relative to (O, B) a se- 
cure construction of adaptive trapdoor permutations exists while any given candi- 
date (and correct) VUF construction (KG°, F°, IT°, V®) is insecurd’}. To prove 
this result, we build an adversary that wins the unpredictability game with non- 
negligible probability. Since the description of the oracle $ is rather technical, we 
first describe the high-level intuitions that guides us to the design of B. 


3.1 Towards the Definition of B 


Towards the definition of such 6, the main difficulty is to design an oracle that is 
strong enough to help predicting a value of the VUF while simultaneously being 
too weak to invert the ATDP. 

A naive approach for B would be the one that immediately breaks the VUF, 
by taking the VUF’s public key PK and a value xv as input; it then would return 
F°(SK,x). Of course, any VUF construction breaks down in the presence of 
such oracle. So, it would remain to show that an ATDP is still secure in the 
presence of such (O, 8), which unfortunately is not the case. To see this, consider 
the following VUF defined through KG°, FO, HO, VO (where HO(SK,-) = 
F°(SK,-)): The KG° algorithm queries ek+-g(td) on a random td € {0,1} 
and sets PK = ek and SK = td. The function evaluation algorithm on input x 


4 By (O, B) we mean that the algorithm AVO,B) gets access to both oracles. 
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obtains y«d(td, x) and outputs y. V(PK,«,y) simply checks that e(ek, y) = z. 
Observe that this construction is sound and unique (but trivially insecure). Now, 
we construct an adversary A against the ATDP that exploits the above defined 
B to invert the challenge (ek*, b*). This attacker inverts the challenge by simply 
submitting (PK = ek*,x = y*) to B! This means that the oracle B that we 
sketched before is too strong and reveals too much information. 

As one can guess, the problem are those queries to B that are “danger- 
ous” in the sense that they extract too much useful information to invert the 
ATDP. Starting from this (toy) example we modify B to prevent such “danger- 
ous queries”. The first important observation is that our adversary against the 
unpredictability only needs to predict some value, rather than a specific one. 
This means, the attacker only needs to find y* for a fresh x* € {0,1}”. There- 
fore, our first modification consists of changing the input that is provided to B. 
Basically, we let B choose z* on which it evaluates y*-F° (SK, a*). This new 
definition of 5 still allows us to break the security of the VUF and it also avoids 
direct inversion queries as the attack can no longer query «x directly to B. 

However, this modification is not sufficient to avoid that an ATDP adversary 
exploits the access to 6. The problem is that an attacker A might encode its 
challenge (ek*,b*) into the public key PK. For instance, A could create and 
submit a public key such that any function evaluation will require to invert b* 
according to the permutation e(ek*,-). We show how to prevent such queries 
starting from the following basic intuition. 

Assume that a value b € {0,1} is (somehow) encoded into the public key 
PK and recall that we denote by x the input of F°(SK,-). Then we have two 
mutually exclusive cases: 


1. F°(SK,-) inverts b on a large fraction of the 2’s; 
2. F°(SK,-) inverts b only on a negligible fraction of the «’s (even on no z in 
the most extreme case). 


Now, recall that a VUF attacker is allowed to query the function (and see the cor- 
responding proofs) for inputs of her choice. Therefore, if A queries the function 
oracles on a sufficiently large number of the x’s, then A will learn the inverses 
of all the “frequent” b’s of type 1 with high probability. On the other hand, for 
any b of type 2, the probability that running F°(SK,a) on a random z asks to 
invert b is negligible. 


Ensuring A Has Access to the Function Oracles. The above intuition sug- 
gests that any algorithm querying 6 must provide as additional input sufficiently 
many triples (£4, yi, m;i) such that 7; is a valid proof for “FO (SK , xi) = y;”. This 
way, ifa ATDP adversary embeds a “type 1” b into PK, then it must know its in- 
verse in order to provide the above triples. Or, if a “type 2” b is encoded into PK, 
then with high probability the attacker A will not gain any further information 
on its inverse from seeing the evaluation of F? (SK ,x*) for a random 2*. 
Although such restriction seems to capture the right intuition, we observe 
that it is not sufficient to prevent the adversary from exploiting 8. To see this, 
assume that A encodes its challenge (ek*,b*) into PK such that b* is of type 
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1, namely F°(SK,2x) queries d(td*,b*) on a large fraction of the z’s. Then, if 
the attacker A is allowed to choose the inputs x71,...,2¢ provided to B, then it 
might take all of them from the small fraction that does not require to invert b*. 
In this case our previous argument would fail. 

Therefore, in order to prevent these dangerous queries, we deny A choosing 
the inputs 21,...,2¢. That is, we define a two-steps oracle B = (B1, B2) where 
Bı chooses £ random inputs, and Bz evaluates the VUF only if it gets as input 
values and proofs for x’s that were chosen by 81. For this we will require that 
Bı is essentially a random function that, given as input a VUF public key and 
a collection of oracle circuits implementing a VUF, outputs £ random strings. 

Furthermore, observe that this restriction is not a problem for the attacker 
that we build against the VUF, because it has access to the function oracles, 
F(SK,-) and I(SK,-), that compute these values and proofs for her. On the 
other hand, an ATDP adversary now has restricted power as it does not know 
b*’s inverse. 


Avoiding Malicious Keys. Finally, the last type of dangerous queries that 
we have to handle are those where the attacker A queries 6 on an “invalid” 
public key PK. By “invalid” we mean that PK is not the output of an honest 
execution of the key generation algorithm KG°(SK). The problem is again 
that an evaluation of F°(SK,2x) can reveal “sensitive” informations about the 
trapdoor permutation. Indeed, observe that an execution of F° must use the 
d(-,-) oracle in a significant way or the VUF cannot be secure] Thus, one may 
think about designing B in such a way that it rejects any queries that involve 
invalid public keys. However, this solution is still dangerous as B might be used 
to test the validity of public keys. We solve the issue by defining B such that 
it computes the answer using a different key SK’ and a different oracle O” but 
that the new function F°” (SK’,-) behaves in almost all cases as the original one 
F°(SK,-). More precisely, the oracle B evaluates F?” (SK',-) using a key SK’ 
(that is most likely different from SK) and an oracle O” which is also different 
from the real oracle O. The key SK’ is computed such that it corresponds to 
the “real” key PK under O” (i.e., PK¢#-KG° (SK"')). The idea is to construct 
O" such that is close to ©. Then we can show that evaluating F°” (SK', x) is 
basically the same as evaluating FO (SK, 2). 

The hope is that ©” differs from O in the points that may represent dangerous 
queries. If this is the case, then we are done as computing F o” (SK ' x) will 
not reveal sensitive informations on the real ATDP. More precisely, our oracle 
B selects uniformly at random a secret key SK’ and an oracle O” such that 
PK = KG” (SK') and O" agrees with O on those points that are already 
known to the adversary. 


Discovering All ATDP Public Keys. In order to correctly simulate a run 
of FO” it is important that our oracle has discovered all the ATDP public keys 
ek that may be needed while running F° . More precisely it needs to know all 


5 For instance, if F° does not use the oracles, then an exponentially-strong adversary 
could always evaluate the circuit associated to F. 
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the public keys that were generated during the honest execution of KG? (SK). 
So, to discover these public keys we define B such that it runs V? on all the 
received triples (zi, yi, mi) and collect all the queries made by the algorithm. 
Since by Assumption [I] the algorithm KG generates at most q of such ek’s, it is 
sufficient to repeat the above step on sufficiently many triples, say qf for some 
constant c that we will specify later. This allows us to discover all the public 
keys with high probability. 


3.2 The Formal Separation Theorem 


In this section we formalize the techniques that we use to prove our result. The 
core of our proof is the description of two oracles O and B. The first oracle 
O = (g,e,d) implements a perfectly random trapdoor permutation and it is 
obvious that a secure ATDP exists relative to O (where the security follows from 
the randomness of the function). Therefore, we follow the strategy of defining a 
“weakening” oracle B whose main task is to break the security of a given VUF 
construction. This approach is formalized in the following theorem: 


Theorem 1 (formally restated). Let O = (g,e,d) be a random trapdoor per- 
mutation oracle. Then, there exists an oracle B such that for every VUF con- 
struction (KG°, FO, HO, VO) which is correct and unique we have: 


(i) there is an adversary A such that AO) breaks the security of the VUF with 
non-negligible probability; 

(ii) there exists an ATDP construction (G? , E°, DÊ) relative to O such that no 
adversary A‘) can break its security with non-negligible probability. 


We formally prove this theorem defining the oracles O and B in the following 
paragraphs. Afterwards, we prove the theorem by stating two separate lemmata. 
The first one, given in Section [4] shows the insecurity of the VUF, whereas the 
second lemma (Section) proves the existence of a secure ATDP. 


The Oracle ©. We prove our separation in a relativized model where each 
algorithm has access to a random trapdoor permutation oracle O = (g,e,d) 
where g,e and d are sampled uniformly at random from the set of all functions 
with the following conditions: 


— g: {0,1} > {0,1} takes a trapdoor key td and outputs a public key ek. 

— e: {0,1} x {0,1}* > {0,1} is a function that takes in input a public key 
ek and a value a and outputs b. For every ek € {0,1}, e(ek,-) is required 
to be a permutation over {0,1}. 

— d : {0,1}* x {0,1}* — {0,1} is a function that on input a pair (td, b) 
outputs the unique a € {0,1}* such that e(g(td),a) = b. 


Since the permutation is defined over {0,1}, it is easy to see that the oracle is 
also an enhanced TDP. 
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Notation. We write A° to denote that an algorithm A is given access to 
an oracle O. We will use square brackets to denote queries and mappings. For 
instance, we write [e(ek, a)] to denote a query to e with input ek and a. Otherwise 
e(ek,a) refers the actual value of the function e on the given input. We write 
[e(ek, a) = b] to denote that there is a mapping between a and b in the function 
e(ek,-). Also, for ease of presentation, we will sometimes abuse the notation and 
write O(a) to denote the answer of O on a query a which depends on the type 
of a. For example if a = [e(ek, a)], then O(a) = e(ek, a). 

Let Op (with k € {1,2}) be a partial (aka suboracle) oracle. We define the 
set of all public keys that are contained into the queries of Op as 


Z(Ox) = {ek : [g(-) = ek] € Ok or [e(ek,-) =-] € Ox}. 


Suboracles. Let ©; and Oz be two (possibly partial) trapdoor permutation 
oracles. We write O1 ©. O2 to denote the oracle that answers with QO, only 
if Og is not defined. Otherwise, it answers with O2. If O1 = (g1,e1,d1) and 
Oz = (g2,e2,d2) are two trapdoor permutation oracles as defined above, then 
its composition is defined by composing each algorithm, namely: 


O1 oc O2 = (gı Sc 92, €1 Yc €2, d1 c d2) 


This definition needs some more explanation. We want that the oracle obtained 
from the composition of two oracles preserves the properties of the two individual 
oracles. In particular, we require that (e1 o- e2)(ek,-) is a permutation for any 
valid ek. The problem is that the permutations e, and e2 may contain collisions, 
namely there exist ek and two distinct values a, a’ € {0,1}* such that e2(ek,a) = 
e1(ek, a’). To handle such collisions we use the same technique suggested in [83]. 
We define e = e1 % €2 as follows: let ek, a,b be values such that [e2(ek, a) = b] € 
O2. We set e(ek, a) = b. If there exists a value a’ Æ a such that [e;(ek, a’) = b] € 
Oj, then let b/ = e1 (ek,a) and set e(ek, a’) = b’. The composition d = dı oc d2 
is defined to be consistent with g and e. 


VUF in the Presence of Our Oracle. For a simpler exposition we make some 
general assumptions on any VUF construction with access to the oracle O = 
(g,e, da). First, we consider a slightly relaxed definition of the VUF algorithms 
(KG, F,IT,V) as follows. The algorithm KG(SK) takes as input a secret key 
SK e€ {0,1}” and outputs PK € {0,1}”. The input of F and J are the secret 
key SK anda value x € {0,1}”". The output of F is the function value y € {0,1}”, 
whereas the output from J is the corresponding 7, respectively. Finally, V is 
given in input the public key PK, an input x, an output y and a proof m and 
outputs 1 if it accepts the proof, or 0 otherwise. In the above description n is a 
function of the security parameter A. 

Recall that we assume towards contradiction that there exists a black-box 
reduction of VUFs to ATDPs. Then we denote by (KG°, F°,1T°,V°) the 
corresponding VUF construction. According to our notation, each algorithm has 
access to the (g,e,d) oracles and they have to use them in a “significant” way 
to implement a secure primitive. Also, by definition of black-box reduction, this 
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construction is a correct VUF implementation, that satisfies completeness and 
uniqueness according to Definition [i] 


Assumption 1. For a simpler exposition, in our proofs we use the following 
assumptions: 


— each algorithm is unbounded, but makes at most q = poly(A) oracle queries 
during its execution; 

— every query d(td,-) is followed by a query g(td); 

— the proof algorithm is deterministic; 

— the verification algorithm is deterministic; 

— the completeness of the VUF holds in a perfect sense. 


Before proceeding with the description of the breaking oracle, we briefly justify 
these assumptions. The first condition is reasonable because we consider only ef- 
ficient constructions and moreover, it allows us to easily quantify the advantage 
of our adversaries. The second one avoids queries of the adversary to d(-,-) using 
a trapdoor key without knowing the corresponding public key. This assumption 
is also common and has been previously used in e.g., [23]. Assuming that the 
proof algorithm is deterministic is not a restriction as we can turn any VRF 
with a probabilistic proof algorithm into one having a deterministic algorithm 
by applying a PRF to the input and the private seed of the VRF to derive the 
randomness. Completeness and uniqueness follow easily from the VRF (note 
that uniqueness only holds w.r.t. to the output of the function and not w.r.t. the 
proof). The rest follows easily applying a standard hybrid argument. The as- 
sumptions on deterministic verification and perfect completeness have already 
been addressed in [17], hence we omit the discussion here. 


A Formal Definition of 5. Here, we provide a formal description of our oracle 
B, which is composed by the following two algorithms (81, B2): 


Algorithm %1: 
INPUT: A collection of oracle circuits VUF? = (KG°, F°, HO, V°) imple- 
menting a VUF, and a VUF public key PK 
OUTPUT: z1,..., £e € {0,1}”. 
COMPUTATION: To each input (VUF°, PK), the algorithm Bı associates 
a random function f : {0,1}" — {0,1}”. For i = 1 to @, it computes 
x; = f(i), and finally it returns 7,..., £e. 


Algorithm Bg: 

INPUT: A collection of oracle circuits VUF? = (KG°, F°, T°, V°) imple- 
menting a VUF, a VUF public key PK and a set { (£4, yi, 7)}§_, such 
that x; € {0,1}”, y; € {0,1}™, and 7; is in the range of H (-,-). 

OUTPUT: «* € {0,1}",y* € {0,1}”. 

COMPUTATION: The oracle performs the following computation: 

— Step 1: Invoke (zx4,..., £1) < Bi(VUF®, PK) and check that the 
values 21,..., 2g received as input are equal to (x{,...,2/) returned 
by Bı. Otherwise, output L. 
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— Step 2: For all i = 1 to £ run the algorithm VO (PK , zi, Yi, Ti) and 
collect into a partial oracle Og all the queries that are made during 
each run. If there is some j such that the verification algorithm does 
not accept, stop and output L. 

— Step 3: Find a secret key SK’ and a partial oracle O’ such that: 

1. KG? (SK') = PK, F? (SK',«;) = yi and IT (SK',a;) = Ti. 
2. O' D Og and |O’| < |Og| +q where q is the same value defined 
in Assumption [I] 

— Step 4: Define O” = Oo, O' 

— Step 5: Choose x* uniformly at random in {0, 1}” such that x* Æ a; 
for all i = 1 to £. Run y*¢+ F°" (SK’, 2*) and 1*+ HO" (SK’,a*). 

— Step 6: Run VO" (PK, 2*,y*,7*). If V?” asks a query a such that 
O" (a) # O(a), then return L. Otherwise output y*. 


Complexity of B. Based on Assumption [I] we evaluate the cost of each query 
to B in terms of queries to the oracle O. Since the function f chosen by By, is 
completely independent of ©, we do not count its cost. Instead a query to Bz 
counts £q + 3q + |O’| queries to O in total. This cost is obtained as follows: Step 
2 makes ¢q queries as it evaluates V £ times, Step 3 is made offline, Step 4 counts 
|O’| queries that are needed to perform the ©, operation and finally Step 5 and 
Step 6 require 2q and q queries respectively. 


4 Insecurity of VUFs Relative to Our Oracles 


In this section we formally show that for every candidate black-box construction 
(KG°, F°, 11°, VO) of a VUF from ATDP there is an efficient adversary A that 
breaks the unpredictability of the VUF with non-negligible probability 1 — 6 by 
making a polynomial number of oracle queries to (O, B). 

Let q be the maximum number of oracle queries that can be made by the 
VUF algorithms (according to Assumption [I) and c € N be a sufficiently large 
constant specified below. Without loss of generality, in the following proof we 
assume q > 2 and we fix c such that 6 < aot and our adversary has non- 
negligible advantage at least 1 — ô. Also we set £ = q°. 

Our adversary A works as follows: 


INPUT: A public key PK and access to the function oracles F(SK,-), 1(SK,-). 
OUTPUT: a*,y* € {0,1}”. 
ALGORITHM: Our algorithm performs the following steps: 

1. Query Bı on input (KG°, F°, HO, VÊ), PK and obtain x1,..., £e. 

2. Query the VUF oracles F'(SK,-), 7(SK,-) on z; for all i = 1 to £. Let 

{yi,71,---, Ye, Te} be the values obtained from such queries. 
. Query Bz on input (KG°, F°, HO, VO), PK, {x1,91,71,---, 0, Ye, Te}. 
4. If B2 returns L, then halt and fail. Otherwise, if B2 returns (x*, y*), then 
output (a*,y*). 
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Then we are able to state the following lemma: 


Lemma 1. The adversary A defined above with input PK and oracle access to 
(O, B) wins the unpredictability experiment with probability at least 1 — aai and 


makes at most 2q°+! + 4q oracle queries. 


The proof is given in the full version [14]. 


5 Security of ATDPs Relative to Our Oracles 


In this section we show the existence of a trapdoor permutation (G°, E°, DO) 
that is adaptively one-way even against adversaries that have access to B. The 
construction is straightforward as each algorithm forwards its input to the corre- 
sponding oracle, namely: GO (td) = g(td), E° (ek, a) = e(ek,a) and DÊ (td, b) = 
d(td, b). 

By the randomness of the oracle O, it is easy to see that the above construction 
is a secure ATDP when the adversary is given access only to ©. Therefore, in 
order to prove its security relative to the oracle 6, we will show that 6 does not 
help to break the one-wayness of (G°, E°, DÊ), namely that B can be simulated 
to the adversary A. Now we can state the following lemma: 


Lemma 2. Let (G°,E°,D°) be an adaptive trapdoor permutation where each 
algorithm forwards its input to g,e, and d respectively. Then, for every adver- 
sary A that has access to (O,B) and makes at most q oracle queries, there is 
a sufficiently large such that the probability that A succeeds in the adaptive 
one-wayness experiment against the above construction is at most negligible. 


5.1 Defining the Simulator 


Recall that the main idea is to show that A can simulate the oracle B locally. To 
do so, we show that for every A, there exists a simulator S that gets the same 
input as A, but which does not have access to 6. We then show that the success 
probability of S is close to that of A. 


Intuition for the Simulator. In the first step, the simulator generates a ran- 
dom trapdoor permutation oracle Os locally, except for the portion concerning 
the permutation e(ek*,-). In particular Os is defined progressively by choosing 
its answers uniformly at random. Moreover, we construct S such that it collects 
into a partial oracle O* all the queries of the form [e(ek*,-)] that A makes dur- 
ing the simulation. This way, S knows all the trapdoors of all the public keys 
(but ek*) and is therefore able to evaluate all inversion queries d(td,-) where 
g(td) # ek*. 

The first three steps of the algorithm Bz can easily be simulated as in the real 
case. The first difference comes up into Step 4 where S has to define the oracle 
O" The difficulty here is that the simulator does not know the entire © and thus 
it cannot compute the composition O o. 0’. We solve this problem using an idea 
similar to the one used in [33]. Namely, we define O” such that it is consistent 
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with the partial oracles that are known to S so far (i.e., Os,O* and O’) and 
we forward all other queries to ©. This solves most of the problematic cases due 
to the fact that the adversary A only knows queried mappings (which are also 
known to S since it has stored all of them). 

One remaining issue are those queries [d(td’, b)] such that td’ is the trapdoor 
that is “virtually” associated to ek* (i.e., [g(td’) = ek*] € O’) and there is no 
known mapping [e(ek*,-) = b] in O*. Indeed, recall that the simulator does not 
know the real trapdoor td* such that [g(td*) = ek*] € O, and also notice that 
forwarding these unknown queries to O would inevitably lead to an inconsistent 
mapping. Assume for example that a = [d(td’,b)] is answered with O(a) = a. 
Then we have a mapping [e(ek*,a) = b] € O”, but it is very unlikely that 
[e(ek*,a) = b] is in O. Such inconsistencies could potentially be discovered in 
Step 6 which would cause the simulation to output L while it should not. 

Fortunately, we show how to handle such queries by using the external in- 
version oracle I(ek*,-). Finally, the last remaining problem is the query a = 
[d(td’, b*)]. We cannot answer this query correctly (at least as long as the inverse 
of b* has not been discovered before), however we will show that this case only 
happens with negligible probability. The main idea is that either A cannot pro- 
vide an accepting input to B2 or (in the case that we have passed all the checks 
and have reached Step 5) the probability that this query cannot be answered is 
very small. 

The full description of the simulator and the proof are provided in the full 


version [14]. 
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